Skip to content

Content Hash as Identity

Filenames change. Paths change. A file gets renamed, moved to a different directory, copied to a backup drive. None of these mutations change what the file is. The content is the identity.

fialr computes a cryptographic hash of every file’s content and uses that hash as the file’s canonical identifier. Everything else — the filename, the path, the directory hierarchy — is mutable metadata attached to that identifier.

AlgorithmRoleRationale
BLAKE3Primary identifierFast, cryptographically secure, streaming-capable. Used as the canonical content hash in SQLite, XATTRs, and all internal references.
SHA256Secondary / archivalWidely supported by external tools. Stored alongside BLAKE3 for cross-tool compatibility and long-term archival verification.
xxhashExcludedNot cryptographically secure. Collisions are possible at scale. Unsuitable for identity or integrity verification.

Both BLAKE3 and SHA256 are computed during inventory. The BLAKE3 hash is the primary key in the files table. SHA256 is stored as an additional column for interoperability.


A filename is a label. It can be changed by the user, by an application, by a sync conflict, or by fialr itself during renaming. A path is a location. It changes when the file moves.

The content hash is invariant under all of these operations:

OperationFilenamePathContent hash
RenameChangesSameSame
Move to different directorySameChangesSame
Copy to new locationSame or differentChangesSame
Edit file contentSameSameChanges

This model has direct consequences:

Renaming does not change identity. When fialr applies its naming convention to a file, the content hash stays the same. The old filename is recorded in XATTRs and SQLite as provenance metadata. The file’s identity is unaffected.

Moving does not change identity. Reorganizing files into a new directory structure updates path metadata but does not alter the content hash. All references by hash remain valid.

Deduplication groups by hash. Two files with different names, in different directories, with different creation dates, are the same file if they have the same content hash. fialr groups them, selects a canonical copy, and moves non-canonical copies to a staging directory with full provenance metadata.


Hashes are stored in two locations with different roles:

LocationRolePlatform
SQLite databaseSource of truthAll platforms
Extended attributes (XATTRs)Cache layermacOS (com.fialr.hash), Linux (user.fialr.hash), Windows (NTFS ADS)

SQLite is authoritative. The files table uses the BLAKE3 hash as its primary key. All queries, dedup operations, and integrity checks reference SQLite. If there is a conflict between SQLite and XATTRs, SQLite wins.

XATTRs are a derived cache. Extended attributes are written alongside SQLite for fast, filesystem-level access. They allow other tools to read a file’s hash without querying the database. XATTRs are rebuilt from SQLite, never the reverse.

Not all filesystems support extended attributes. FAT32, exFAT, and some network mounts do not.

When XATTRs are unsupported, fialr writes to SQLite only. The skip is logged. No error is raised. No functionality is lost. The database remains the complete record.

This is a design choice, not a workaround. The system must function identically whether XATTRs are available or not. SQLite is the contract. XATTRs are a convenience.


fialr provides three verification modes through the validate command:

ModeScopeUse case
spotRandom sample of filesQuick confidence check. Suitable for routine verification.
manifestAll files listed in a job manifestPost-operation verification. Confirms that a specific job did not corrupt any files.
fullEvery file in the databaseComplete corpus integrity audit. Recomputes all hashes and compares against stored values.

In all modes, verification recomputes the BLAKE3 hash from the file’s current content and compares it against the stored hash in SQLite. A mismatch means the file content has changed since it was last indexed — either legitimately (the file was edited) or due to corruption.

Mismatches are reported with the file path, expected hash, actual hash, and the job that last operated on the file. The decision to act on a mismatch is left to the operator.