enrich

fialr enrich <target> [options]

Extract text from files and run inference via the configured provider (Ollama by default, or cloud via BYOK) to generate structured metadata: filename tokens, semantic tags, a one-sentence summary, and a confidence score. Tier 1 files are enriched by local AI (Ollama) by default, with results always routed to the review queue. Cloud requires two-step confirmation.

Arguments

Argument	Description
`target`	Directory to enrich (required)

Options

Option	Description
`--execute`	Apply enrichment metadata (not just report)
`--embed-only`	Compute embeddings only, without running AI text extraction or inference. Use to backfill or recompute embeddings.
`--yes`, `-y`	Skip cloud cost confirmation prompt
`--jobs-dir PATH`	Directory for job artifacts (default: `.fialr/jobs`)
`--cloud-refine`	Enable 2-step enrichment: local extraction → sanitization → cloud refinement. Tier 3 automatic, Tier 2 opt-in, Tier 1 falls back to local-only unless two-step confirmation active.
`--sensitivity-rules PATH`	Path to `sensitivity.yaml` (default: `config/sensitivity.yaml`)

Prerequisites

Enrichment requires:

Ollama running locally at http://localhost:11434 (default provider), or a cloud provider configured via fialr config ai
A pulled model for Ollama (default: llama3.2, configurable under [enrichment].model)

What it does

Tier restrictions

Sensitivity tiers gate access to the enrichment pipeline:

Tier	Access
1 (RESTRICTED)	Local AI (Ollama). Results always routed to review queue. Cloud requires two-step confirmation.
2 (SENSITIVE)	Configured provider processes extracted text. Human confirmation required.
3 (INTERNAL)	Full enrichment via configured provider.

Text extraction

fialr extracts text from files using format-specific tools:

Format	Extraction method
Scanned PDF	ocrmypdf + Tesseract OCR
Native PDF	pypdfium2
Photos	piexif (EXIF metadata)
Audio	mutagen (ID3 tags)
Office documents	python-docx, openpyxl

Inference

Extracted text is sent to the configured provider (Ollama on localhost by default, or a cloud provider if configured). The inference layer is abstracted behind a provider interface. The model returns structured JSON:

Date — document subject date
Entity — primary subject or organization
Descriptor — semantic description
Tags — semantic tags
Summary — one-sentence summary
Confidence — 0.0 to 1.0 score

Confidence routing

The confidence threshold (default: 0.7, configurable in fialr.toml under [enrichment].confidence_threshold) determines what happens with inference results:

Above threshold — metadata is auto-applied to XATTRs and SQLite
Below threshold — file is sent to the review queue with the LLM suggestion attached as a hint for manual review

Post-enrichment embeddings

When embeddings are enabled ([embeddings] enabled = true in fialr.toml), enrichment automatically computes a vector embedding for each successfully enriched file. These embeddings power semantic search, similar file discovery, and improve future enrichment quality through adaptive corpus context.

Output

Dry-run:

enrich ~/Documents

────────────────────────────────────────────────────────
   enriched  623
     review  89
    skipped  135 (tier 1: 12, no text: 123)
     errors  0
 embeddings  auto

Examples

# Dry-run enrichment (default provider: Ollama)
fialr enrich ~/Documents

# Apply enrichment metadata
fialr enrich ~/Documents --execute

# Cloud enrichment, skip cost confirmation
fialr enrich ~/Documents --execute --yes

# 2-step enrichment: local extraction → sanitized → cloud refinement
fialr enrich ~/Documents --cloud-refine --execute

# Compute embeddings only (no AI inference)
fialr enrich ~/Documents --embed-only

# Recompute embeddings after model change
fialr enrich ~/Documents --embed-only --force