Domain Validation

Before every ingest, the agent validates that the new document actually belongs to the declared project. The validation uses a blind review process — the two evaluations run independently to avoid anchoring bias.

try it

“Ingest raw/notes.md as project-beta”

Phase A — Semantic evaluation (blind)

The agent reads the new document and the existing wiki pages for the target project. It forms its own assessment of thematic relevance — without running the quantitative script and without seeing any score. This avoids anchoring bias: if the agent saw “63% PROCEED” first, it would tend to align with the number instead of reasoning independently.

Phase B — Quantitative scoring

A Python script (tools/validate_domain.py) that runs independently with zero API calls. It compares the new document against all wiki pages tagged with the target project:

TF-IDF + Cosine Similarity — vectorizes both the new document and the concatenated corpus text, then computes cosine similarity. Captures vocabulary and terminology overlap.
Entity Overlap— collects known entities from the project's wiki pages, normalizes names, and searches for them in the new document using exact matching for short names and fuzzy matching (rapidfuzz, threshold 85%) for longer names.
Concept Overlap — same mechanism applied to concepts (ideas, frameworks, methods).
Composite Score — weighted average of the three signals.

Signal weights

Signal	Weight	What it captures
Lexical similarity (TF-IDF)	40%	Vocabulary and terminology overlap
Entity overlap (Jaccard + fuzzy)	30%	Shared people, companies, products
Concept overlap (Jaccard + fuzzy)	30%	Shared ideas, frameworks, methods

Phase C — Comparison and verdict

The agent now sees both evaluations side by side — its own blind assessment and the quantitative report. If they agree, the decision is high-confidence. If they disagree, the discrepancy is surfaced to the user with an explanation (e.g. “the document uses new terminology but is thematically aligned with the project”).

Only after producing the verdict, the agent checks rejected.json for previous rejection entries matching the document. If found, the previous rejection reason is included in the report to the user — but never before the blind evaluation, to avoid anchoring on the old judgment.

Affinity decisions

Affinity	Behavior
Both evaluations agree	Proceed automatically or warn, depending on the level
Evaluations disagree	Present both to the user with an explanation of the discrepancy
First document for this tag	Skip validation entirely (no existing corpus to compare against)

Workflow diagram

The validation is advisory, never blocking — the user always has the final word. If the user declines the ingest, the document is tracked in rejected.json with the reason for rejection. Rejected documents are logged in a separate file from wiki/log.md to prevent semantic contamination when the agent reads the ingest history.