Architecture

Why not RAG?

JanusLM builds on the LLM Wiki approach — a knowledge management system where the AI agent compiles, structures, and maintains knowledge at ingest time, rather than re-deriving it from raw chunks on every query.

Traditional RAG vs. LLM Wiki

	Traditional RAG	LLM Wiki approach
Knowledge	Re-derived from scratch on every query	Compiled once at ingest, kept current over time
Retrieval unit	Raw document chunks	Structured, interlinked wiki pages
Cross-refs	Constructed at query time, if at all	Pre-built during ingest — entities and concepts linked
Contradictions	May surface at query time, often missed	Detected and flagged at ingest time
Accumulation	None — each query starts from zero	Compounding — every new source enriches the whole wiki

Agent restructuring

Before

A single instruction file with all wiki workflows baked in. The agent was a dedicated “wiki maintainer” and nothing else.

After

Lean instructions define a general-purpose agent with KB access. Wiki workflows are split into focused skills: /wiki-ingest for document ingestion, /wiki-query for KB search, and /maintainer for health, stats, and graph operations. Other skills (docx, pptx, frontend…) slot in the same way.

Architecture diagram — agent + skill + KB structure

Deterministic + semantic architecture

JanusLM draws a clear line between what the agent does and what Python scripts do. Structural operations — scaffolding pages, validating frontmatter, counting terms, scanning for wikilinks, tracking queue state — are handled by deterministic tools that produce the same output every time. Semantic operations — reading documents, understanding meaning, finding correlations, writing content — are handled by the agent.

The two layers reinforce each other. The agent can't forget to check for broken links because a script does it mechanically. The script can't understand whether a document belongs to a project because that requires reasoning. Neither layer works alone; together they cover each other's blind spots.

try it

“What do we know about RAG across all projects?”

Every Python tool in tools/ follows the same contract: reads files, produces JSON or a table to stdout, never calls an API, never modifies wiki content directly. The agent orchestrates the tools through skill-defined workflows.

Mandatory tagging system

Every ingested document receives a mandatory project tag (e.g. project-alpha, ai-strategy, client-onboarding). The agent proactively asks for it before every ingest — no document enters without a domain.

How tags propagate

The source page receives the project tag
Every entity page touched gets the tag added (without removing existing tags)
Every concept page touched gets the tag added
If an entity/concept page already exists with a different project tag, new content goes under a separate heading (## In project-alpha)

Directory layout

raw/              # Source documents (or anonymized output from maskzone)
maskzone/         # Privacy mode entry — files here get anonymized, originals stay
processed/        # Original binaries archived after conversion
staging/          # Temporary staging for ingest pipeline — do not edit manually
rejected.json     # Documents declined during domain validation
heal_queue.json   # Persistent heal state (pending/completed/skipped items)
ingest_queue.json # Persistent ingest state (pending/completed/skipped items)
wiki/             # Knowledge base (read freely, modify only via workflows)
  index.md    # Catalog of all pages
  log.md      # Operation history (ingest, health, graph, anonymize)
  sources/    # One summary page per ingested document
  entities/   # People, companies, projects, products
  concepts/   # Ideas, frameworks, methods, theories
graph/            # Auto-generated graph data
tools/            # Deterministic Python scripts (health, ingest, graph, search, validation)
freespace/        # User's personal workspace — no effect on JanusLM

The scripts in tools/ resolve paths relative to this structure. Altering the directory layout may cause unexpected behavior.

freespace/is your personal workspace. Store your own files, create subdirectories, generate documents with the agent — anything you put here has no effect on JanusLM's wiki, tools, or workflows.