Architecture

Why not RAG?

JanusLM builds on the LLM Wiki approach — a knowledge management system where the AI agent compiles, structures, and maintains knowledge at ingest time, rather than re-deriving it from raw chunks on every query.

Traditional RAG vs. LLM Wiki

Traditional RAGLLM Wiki approach
KnowledgeRe-derived from scratch on every queryCompiled once at ingest, kept current over time
Retrieval unitRaw document chunksStructured, interlinked wiki pages
Cross-refsConstructed at query time, if at allPre-built during ingest — entities and concepts linked
ContradictionsMay surface at query time, often missedDetected and flagged at ingest time
AccumulationNone — each query starts from zeroCompounding — every new source enriches the whole wiki

Agent restructuring

Before

A single instruction file with all wiki workflows baked in. The agent was a dedicated “wiki maintainer” and nothing else.

After

Lean instructions define a general-purpose agent with KB access. Heavy wiki workflows live in a /maintainer skill, loaded only when needed. Other skills (docx, pptx, frontend…) slot in the same way.

Architecture diagram — agent + skill + KB structure

JanusLM agent architecture

Deterministic + semantic architecture

JanusLM draws a clear line between what the agent does and what Python scripts do. Structural operations — scaffolding pages, validating frontmatter, counting terms, scanning for wikilinks, tracking queue state — are handled by deterministic tools that produce the same output every time. Semantic operations — reading documents, understanding meaning, finding correlations, writing content — are handled by the agent.

The two layers reinforce each other. The agent can't forget to check for broken links because a script does it mechanically. The script can't understand whether a document belongs to a project because that requires reasoning. Neither layer works alone; together they cover each other's blind spots.

try it

What do we know about RAG across all projects?

Every Python tool in tools/ follows the same contract: reads files, produces JSON or a table to stdout, never calls an API, never modifies wiki content directly. The agent orchestrates the tools through skill-defined workflows.

Mandatory tagging system

Every ingested document receives a mandatory project tag (e.g. project-alpha, ai-strategy, client-onboarding). The agent proactively asks for it before every ingest — no document enters without a domain.

How tags propagate

  • The source page receives the project tag
  • Every entity page touched gets the tag added (without removing existing tags)
  • Every concept page touched gets the tag added
  • If an entity/concept page already exists with a different project tag, new content goes under a separate heading (## In project-alpha)

Directory layout

raw/              # Source documents (or anonymized output from maskzone)
maskzone/         # Privacy mode entry — files here get anonymized, originals stay
processed/        # Original binaries archived after conversion
rejected.json     # Documents declined during domain validation
heal_queue.json   # Persistent heal state (pending/completed/skipped items)
ingest_queue.json # Persistent ingest state (pending/completed/skipped items)
wiki/             # Knowledge base (read freely, modify only via /maintainer)
  index.md    # Catalog of all pages
  log.md      # Operation history (ingest, lint, health, graph)
  sources/    # One summary page per ingested document
  entities/   # People, companies, projects, products
  concepts/   # Ideas, frameworks, methods, theories
graph/            # Auto-generated graph data
tools/            # Deterministic Python scripts (health, ingest, graph, search, validation)
freespace/        # User's personal workspace — no effect on JanusLM

The scripts in tools/ resolve paths relative to this structure. Altering the directory layout may cause unexpected behavior.

freespace/is your personal workspace. Store your own files, create subdirectories, generate documents with the agent — anything you put here has no effect on JanusLM's wiki, tools, or workflows.