Skip to content

Enrichment

Enrichment improves filename quality and writes structured metadata to files using AI. Text is extracted from documents, images, and media files, then processed by a language model to generate semantic metadata. All tiers are enriched by local AI (Ollama) by default. Tier 1 results are always routed to the review queue. Tier 2–3 files can optionally use a cloud provider for higher-quality results.

Enrichment respects the sensitivity tier system. This is enforced in code, not by convention.

TierEnrichment access
1 (RESTRICTED)Local AI (Ollama). Results always routed to review queue. Cloud requires two-step confirmation.
2 (SENSITIVE)Extracted text processed by configured provider (Ollama or cloud). Human confirmation required before applying results.
3 (INTERNAL)Full enrichment via configured provider. Results above confidence threshold are applied automatically.

Tier 1 files are enriched by local AI (Ollama) like any other tier, but results are never auto-applied — they always go to the review queue for human confirmation. To use a cloud provider on Tier 1 files, a two-step confirmation is required: a config flag and a CLI flag must both be active. This ensures deliberate intent without excessive friction. See Tier 1 cloud override below and Sensitivity Tiers for the full tier model.


Enrichment begins with text extraction. The extraction method depends on the file type:

File TypeExtraction ToolWhat is extracted
Scanned PDFocrmypdf + TesseractOCR text from page images
Native PDFpypdfium2Embedded text content
ImagesTesseractOCR text from image content
PhotospiexifEXIF metadata (date, camera, GPS)
AudiomutagenID3/metadata tags (title, artist, album)
Word documentspython-docxDocument text and metadata
Excel spreadsheetsopenpyxlSheet names, header rows, metadata

Extracted text is passed to the inference layer. It is not stored on disk separately — it exists only in memory during processing.


By default, inference runs on your machine through Ollama. The inference layer is abstracted behind a provider interface, which handles model communication, prompt construction, and response parsing. For Tier 2–3 files, you can optionally configure a cloud provider (Claude API) for higher-quality classification via fialr config ai with your own API key.

The model receives the extracted text and returns structured JSON:

{
"date": "2024-03-15",
"entity": "acme_corp",
"descriptor": "quarterly_revenue_report",
"tags": ["financial", "quarterly", "revenue"],
"summary": "Q1 2024 revenue report for Acme Corp showing YoY growth.",
"confidence": 0.87
}

The response provides filename tokens (date, entity, descriptor), semantic tags, a one-sentence summary, and a confidence score.

Tier 1 files are always enriched by local AI (Ollama), regardless of provider configuration. Results are routed to the review queue. Cloud access for Tier 1 requires a two-step confirmation.


The confidence score determines what happens to the enrichment output:

  • Above threshold — results are applied automatically (Tier 3) or queued for confirmation (Tier 2)
  • Below threshold — the file is written to the review_queue with the LLM suggestion attached as a hint

The reviewer sees the model’s proposed filename tokens, tags, and summary alongside the file’s current name and path. They can accept, modify, or reject the suggestion.


Enrichment runs through a configurable provider. The provider handles model communication, prompt construction, and response parsing. Two providers are available:

ProviderScopeSetup
Ollama (default)Local inference on your machineInstall Ollama, pull a model
Claude API (opt-in)Cloud inference via AnthropicBring your own API key
Two-step (opt-in)Local extraction → sanitized metadata → cloud refinementConfigure provider as two-step or use --cloud-refine flag

Tier 1 files are always processed by local AI, regardless of cloud provider configuration. Cloud access for Tier 1 requires a two-step confirmation.

Use Ollama when you want everything local and free. Use Claude when you want higher-quality classification for Tier 2–3 files and are willing to send extracted text to Anthropic’s API.

Configure the provider with:

Terminal window
# Interactive setup
fialr config ai
# Non-interactive: switch to Claude
fialr config ai --provider claude --key sk-ant-...
# Non-interactive: switch back to Ollama
fialr config ai --provider ollama
# Check current configuration
fialr config ai --show

The standalone fialr configure ai command is still available for backward compatibility.

API keys are stored in the system keychain (via keyring), not in config files. You can also set the ANTHROPIC_API_KEY environment variable for CI or scripting.


When enrichment processes a file, fialr can improve the quality of metadata extraction by providing the LLM with few-shot examples from semantically similar files in the corpus.

If embeddings have been generated for the corpus (via fialr embed, which uses the nomic-embed-text model through Ollama), the enrichment system automatically:

  1. Computes a query embedding for the file being enriched
  2. Finds the most similar files already in the corpus (by cosine similarity)
  3. Includes their metadata (entity, descriptor, tags) in the LLM prompt as examples
  4. The LLM uses these examples to produce more consistent, higher-quality metadata

This is most effective after enriching a substantial corpus. Early files benefit less because fewer similar examples exist. As more files are enriched and embedded, metadata quality improves across the board. The corpus learns from itself.

Enrichment context is automatic when embeddings exist. No configuration required. If no embeddings are available, enrichment runs normally without the context feature.

Embeddings are computed automatically during fialr enrich --execute when Ollama is available and [embeddings] enabled = true (the default). Each file’s embedding is stored alongside its enrichment metadata in a single pass.

To generate or recompute embeddings independently (after a model change, or to backfill files enriched before embeddings were enabled):

Terminal window
fialr embed ~/Documents

Two-step enrichment combines local and cloud processing for higher-quality results without sending raw file content to the cloud.

  1. Local extraction — Ollama processes the raw extracted text locally, producing initial metadata (entity, descriptor, tags, summary)
  2. Sanitization — The local inference output is sanitized: SSN patterns, credit card numbers (Luhn-validated), bank account/routing numbers, and EIN are stripped. Names, institutions, document types (W-2, 1099), dates, and tags are preserved.
  3. Cloud refinement — The sanitized metadata (never the raw file text) is sent to the configured cloud provider (Claude API) for quality improvement
TierTwo-step behavior
3 (INTERNAL)Automatic when --cloud-refine is used
2 (SENSITIVE)Opt-in via --cloud-refine flag, with human confirmation
1 (RESTRICTED)Falls back to local-only unless two-step cloud confirmation is active

Enable two-step as the default provider in fialr.toml:

[enrichment]
provider = "two-step"

Or use it per-invocation with the --cloud-refine flag:

Terminal window
fialr enrich ~/Documents --cloud-refine --execute

The sanitization step strips specific PII patterns from the local inference output before cloud transmission:

PatternAction
Social Security numbers (XXX-XX-XXXX)Stripped
Credit card numbers (Luhn-valid 13–19 digits)Stripped
Bank account / routing numbersStripped
Employer Identification Numbers (EIN)Stripped
Names, institutionsPreserved
Document types (W-2, 1099, invoice)Preserved
Dates, tags, categoriesPreserved

Raw file content never leaves the machine. Only sanitized inference metadata is sent to the cloud provider.


Enrichment settings live in fialr.toml:

[enrichment]
provider = "ollama"
model = "llama3.2"
endpoint = "http://localhost:11434"
cloud_model = "claude-sonnet-4-20250514"
confidence_threshold = 0.7
SettingDefaultDescription
providerollamaInference provider: ollama, claude, or two-step
modelllama3.2Ollama model name (local provider)
endpointhttp://localhost:11434Ollama API endpoint (must be localhost)
cloud_modelclaude-sonnet-4-20250514Claude model for cloud inference
confidence_threshold0.7Minimum confidence for auto-apply

Enrichment with Ollama requires the server running locally with a model pulled:

Terminal window
# Install Ollama
brew install ollama # macOS
# See ollama.com for Linux install
# Pull the model specified in fialr.toml
ollama pull llama3.2
# Start the Ollama server
ollama serve

fialr checks for Ollama availability before starting enrichment. If the server is not running or the configured model is not available, the command fails with a clear error. No partial processing occurs.

Cloud enrichment requires the cloud optional dependency group and an API key:

Terminal window
pip install 'fialr[cloud]'
fialr config ai --provider claude --key sk-ant-...

Terminal window
# Dry-run (default) — report only, no metadata written
fialr enrich ~/Documents
# Apply enrichment metadata
fialr enrich ~/Documents --execute
# Skip cloud cost confirmation prompt
fialr enrich ~/Documents --execute --yes

When using the Claude provider, fialr estimates the token count and cost before processing. You are prompted to confirm:

provider claude
eligible files 2,389
estimated tokens 1,493,125 in / 477,800 out
estimated cost $0.0045 + $0.0072 = $0.0117
Continue? [y/N]

Use --yes to bypass the prompt (for scripting or CI). Ollama is local and free — no estimate is shown.

Enrichment processes files across all tiers (Tier 1 local only, Tier 2–3 via configured provider):

jobs/2026-03-11_enrich_a1b2c3d4/
log.json
report.md
checkpoint.json

Terminal output:

enriched 1,847
review 565
skipped 23
errors 12
total 0.42s

Enrichment metadata is written to XATTRs (com.fialr.enriched_at, com.fialr.tags) and to the SQLite files table. The review_queue table receives files below the confidence threshold with the LLM suggestion stored as a hint.


By default, Tier 1 files are enriched by local AI (Ollama), with results always routed to the review queue. For cases where you need to send Tier 1 metadata to a cloud provider (e.g., higher-quality classification via two-step enrichment), a two-step confirmation is required:

StepHow to set
Config flagallow_tier1_cloud = true in [enrichment] section of fialr.toml
CLI flag--allow-tier1 passed to fialr enrich

Both must be active. If either is missing, Tier 1 files fall back to local-only processing. An interactive confirmation prompt also appears before cloud processing begins.

This design ensures deliberate intent — you cannot enable cloud access for sensitive files by misconfiguring a single setting. Two-step enrichment preserves privacy by sending only sanitized metadata (never raw file content) to the cloud provider.


The prompt sent to the inference provider is customizable via a Liquid-like template at config/enrichment_prompt.liquid. The template uses the same engine as rename templates — variables and filters only, no loops or macros.

Available template variables:

VariableDescription
{{ filename }}Current filename
{{ mime_type }}Detected MIME type
{{ extracted_text }}Text extracted from the file
{{ file_size }}File size in bytes

Edit the template to adjust what the model receives and how it should respond. The default template requests structured JSON with date, entity, descriptor, tags, summary, and confidence fields.


After enrichment, the corpus has complete metadata: sensitivity tiers, schema categories, content hashes, and AI-generated semantic tags. Run validation to verify integrity, or export to generate sidecar metadata files.

For the full command reference, see fialr enrich.