Architecture
Why not RAG?
JanusLM builds on the LLM Wiki approach — a knowledge management system where the AI agent compiles, structures, and maintains knowledge at ingest time, rather than re-deriving it from raw chunks on every query.
Traditional RAG vs. LLM Wiki

| Traditional RAG | LLM Wiki approach | |
|---|---|---|
| Knowledge | Re-derived from scratch on every query | Compiled once at ingest, kept current over time |
| Retrieval unit | Raw document chunks | Structured, interlinked wiki pages |
| Cross-refs | Constructed at query time, if at all | Pre-built during ingest — entities and concepts linked |
| Contradictions | May surface at query time, often missed | Detected and flagged at ingest time |
| Accumulation | None — each query starts from zero | Compounding — every new source enriches the whole wiki |
Agent restructuring
A single instruction file with all wiki workflows baked in. The agent was a dedicated “wiki maintainer” and nothing else.
Lean instructions define a general-purpose agent with KB access. Heavy wiki workflows live in a /maintainer skill, loaded only when needed. Other skills (docx, pptx, frontend…) slot in the same way.
Architecture diagram — agent + skill + KB structure

Deterministic + semantic architecture
JanusLM draws a clear line between what the agent does and what Python scripts do. Structural operations — scaffolding pages, validating frontmatter, counting terms, scanning for wikilinks, tracking queue state — are handled by deterministic tools that produce the same output every time. Semantic operations — reading documents, understanding meaning, finding correlations, writing content — are handled by the agent.
The two layers reinforce each other. The agent can't forget to check for broken links because a script does it mechanically. The script can't understand whether a document belongs to a project because that requires reasoning. Neither layer works alone; together they cover each other's blind spots.
try it
“What do we know about RAG across all projects?”
Every Python tool in tools/ follows the same contract: reads files, produces JSON or a table to stdout, never calls an API, never modifies wiki content directly. The agent orchestrates the tools through skill-defined workflows.
Mandatory tagging system
Every ingested document receives a mandatory project tag (e.g. project-alpha, ai-strategy, client-onboarding). The agent proactively asks for it before every ingest — no document enters without a domain.
How tags propagate
- The source page receives the project tag
- Every entity page touched gets the tag added (without removing existing tags)
- Every concept page touched gets the tag added
- If an entity/concept page already exists with a different project tag, new content goes under a separate heading (
## In project-alpha)
Directory layout
raw/ # Source documents (or anonymized output from maskzone)
maskzone/ # Privacy mode entry — files here get anonymized, originals stay
processed/ # Original binaries archived after conversion
rejected.json # Documents declined during domain validation
heal_queue.json # Persistent heal state (pending/completed/skipped items)
ingest_queue.json # Persistent ingest state (pending/completed/skipped items)
wiki/ # Knowledge base (read freely, modify only via /maintainer)
index.md # Catalog of all pages
log.md # Operation history (ingest, lint, health, graph)
sources/ # One summary page per ingested document
entities/ # People, companies, projects, products
concepts/ # Ideas, frameworks, methods, theories
graph/ # Auto-generated graph data
tools/ # Deterministic Python scripts (health, ingest, graph, search, validation)
freespace/ # User's personal workspace — no effect on JanusLMThe scripts in tools/ resolve paths relative to this structure. Altering the directory layout may cause unexpected behavior.
freespace/is your personal workspace. Store your own files, create subdirectories, generate documents with the agent — anything you put here has no effect on JanusLM's wiki, tools, or workflows.