Content Hash as Identity

Filenames change. Paths change. A file gets renamed, moved to a different directory, copied to a backup drive. None of these mutations change what the file is. The content is the identity.

fialr computes a cryptographic hash of every file’s content and uses that hash as the file’s canonical identifier. Everything else — the filename, the path, the directory hierarchy — is mutable metadata attached to that identifier.

Hash algorithms

Algorithm	Role	Rationale
BLAKE3	Primary identifier	Fast, cryptographically secure, streaming-capable. Used as the canonical content hash in SQLite, XATTRs, and all internal references.
SHA256	Secondary / archival	Widely supported by external tools. Stored alongside BLAKE3 for cross-tool compatibility and long-term archival verification.

Both BLAKE3 and SHA256 are computed during inventory. The BLAKE3 hash is the primary key in the files table. SHA256 is stored as an additional column for interoperability.

Why content, not filenames

A filename is a label. It can be changed by the user, by an application, by a sync conflict, or by fialr itself during renaming. A path is a location. It changes when the file moves.

The content hash is invariant under all of these operations:

Operation	Filename	Path	Content hash
Rename	Changes	Same	Same
Move to different directory	Same	Changes	Same
Copy to new location	Same or different	Changes	Same
Edit file content	Same	Same	Changes

This model has direct consequences:

Renaming does not change identity. When fialr applies its naming convention to a file, the content hash stays the same. The old filename is recorded in XATTRs and SQLite as provenance metadata. The file’s identity is unaffected.

Moving does not change identity. Reorganizing files into a new directory structure updates path metadata but does not alter the content hash. All references by hash remain valid.

Deduplication groups by hash. Two files with different names, in different directories, with different creation dates, are the same file if they have the same content hash. fialr groups them, selects a canonical copy, and moves non-canonical copies to a staging directory with full provenance metadata.

Where hashes are stored

Hashes are stored in two locations with different roles:

Location	Role	Platform
SQLite database	Source of truth	All platforms
Extended attributes (XATTRs)	Cache layer	macOS (`com.fialr.hash`), Linux (`user.fialr.hash`)

SQLite is authoritative. The files table uses the BLAKE3 hash as its primary key. All queries, dedup operations, and integrity checks reference SQLite. If there is a conflict between SQLite and XATTRs, SQLite wins.

XATTRs are a derived cache. Extended attributes are written alongside SQLite for fast, filesystem-level access. They allow other tools to read a file’s hash without querying the database. XATTRs are rebuilt from SQLite, never the reverse.

XATTR degradation policy

Not all filesystems support extended attributes. FAT32, exFAT, and some network mounts do not.

When XATTRs are unsupported, fialr writes to SQLite only. The skip is logged. No error is raised. No functionality is lost. The database remains the complete record.

This is a design choice, not a workaround. The system must function identically whether XATTRs are available or not. SQLite is the contract. XATTRs are a convenience.

Integrity verification

fialr provides four verification checks through the validate command:

Check	Scope	Use case
`paths`	Every file in the database	Verify that all tracked files still exist on disk at their recorded paths.
`hashes`	Every file in the database	Recompute BLAKE3 hashes and compare against stored values. Detects content changes or corruption.
`xattrs`	Every file in the database	Verify that extended attributes on each file match database records.
`all`	All checks combined	Run paths, hashes, and xattrs checks in sequence. Default mode.

Hash verification recomputes the BLAKE3 hash from the file’s current content and compares it against the stored hash in SQLite. A mismatch means the file content has changed since it was last indexed — either legitimately (the file was edited) or due to corruption.

Mismatches are reported with the file path, expected hash, actual hash, and the job that last operated on the file. The decision to act on a mismatch is left to the operator.