Skip to content

enrich

fialr enrich <target> [options]

Extract text from files and run inference via the configured provider (Ollama by default, or cloud via BYOK) to generate structured metadata: filename tokens, semantic tags, a one-sentence summary, and a confidence score. Tier 1 files are enriched by local AI (Ollama) by default, with results always routed to the review queue. Cloud requires two-step confirmation.


ArgumentDescription
targetDirectory to enrich (required)
OptionDescription
--executeApply enrichment metadata (not just report)
--embed-onlyCompute embeddings only, without running AI text extraction or inference. Use to backfill or recompute embeddings.
--yes, -ySkip cloud cost confirmation prompt
--jobs-dir PATHDirectory for job artifacts (default: .fialr/jobs)
--cloud-refineEnable 2-step enrichment: local extraction → sanitization → cloud refinement. Tier 3 automatic, Tier 2 opt-in, Tier 1 falls back to local-only unless two-step confirmation active.
--sensitivity-rules PATHPath to sensitivity.yaml (default: config/sensitivity.yaml)

Enrichment requires:

  1. Ollama running locally at http://localhost:11434 (default provider), or a cloud provider configured via fialr config ai
  2. A pulled model for Ollama (default: llama3.2, configurable under [enrichment].model)

Sensitivity tiers gate access to the enrichment pipeline:

TierAccess
1 (RESTRICTED)Local AI (Ollama). Results always routed to review queue. Cloud requires two-step confirmation.
2 (SENSITIVE)Configured provider processes extracted text. Human confirmation required.
3 (INTERNAL)Full enrichment via configured provider.

fialr extracts text from files using format-specific tools:

FormatExtraction method
Scanned PDFocrmypdf + Tesseract OCR
Native PDFpypdfium2
Photospiexif (EXIF metadata)
Audiomutagen (ID3 tags)
Office documentspython-docx, openpyxl

Extracted text is sent to the configured provider (Ollama on localhost by default, or a cloud provider if configured). The inference layer is abstracted behind a provider interface. The model returns structured JSON:

  • Date — document subject date
  • Entity — primary subject or organization
  • Descriptor — semantic description
  • Tags — semantic tags
  • Summary — one-sentence summary
  • Confidence — 0.0 to 1.0 score

The confidence threshold (default: 0.7, configurable in fialr.toml under [enrichment].confidence_threshold) determines what happens with inference results:

  • Above threshold — metadata is auto-applied to XATTRs and SQLite
  • Below threshold — file is sent to the review queue with the LLM suggestion attached as a hint for manual review

When embeddings are enabled ([embeddings] enabled = true in fialr.toml), enrichment automatically computes a vector embedding for each successfully enriched file. These embeddings power semantic search, similar file discovery, and improve future enrichment quality through adaptive corpus context.


Dry-run:

enrich ~/Documents
────────────────────────────────────────────────────────
enriched 623
review 89
skipped 135 (tier 1: 12, no text: 123)
errors 0
embeddings auto

Terminal window
# Dry-run enrichment (default provider: Ollama)
fialr enrich ~/Documents
# Apply enrichment metadata
fialr enrich ~/Documents --execute
# Cloud enrichment, skip cost confirmation
fialr enrich ~/Documents --execute --yes
# 2-step enrichment: local extraction → sanitized → cloud refinement
fialr enrich ~/Documents --cloud-refine --execute
# Compute embeddings only (no AI inference)
fialr enrich ~/Documents --embed-only
# Recompute embeddings after model change
fialr enrich ~/Documents --embed-only --force

  • Enrichment guide — walkthrough of the enrichment process
  • Sensitivity Tiers — how tiers control enrichment access
  • scan — check sensitivity tiers before enrichment
  • search — search enriched metadata with --semantic for vector similarity