Two-faced AI agent.
One side reads, catalogs, connects. The other acts: writes documents, analyzes data, answers questions. Every document in makes answers sharper. Every answer reveals what's still missing.
Knowledge and action, in a loop.
Get started in seconds.
Clone the repo, open it with any coding agent, and start building your knowledge base.
$ git clone https://github.com/Lover0ne/JanusLM.git
$ cd JanusLM
$ claude # or codex, gemini, cursor
> ingest raw/my-document.md
> What are the main themes?What is JanusLM?
JanusLM turns any AI coding agent into a general-purpose assistant backed by a structured, multi-project knowledge base.
It builds on the LLM Wiki concept by Andrej Karpathy โ a knowledge management system fully maintained by an AI agent. You drop documents in, the agent decomposes them into structured, interlinked pages: sources, entities, concepts, and syntheses. It maintains an index, a living overview, and a knowledge graph.
Where we saw room to grow.
As powerful as the LLM Wiki idea is, working with it across multiple projects revealed two natural limits.
No project separation
The wiki treats all knowledge as a single flat space. Ingesting documents from different projects means entities and concepts merge together โ a page like OpenAI.md blends information from every context. Great for discovery, harder when you need a clean, scoped answer.
Single-purpose agent
The instruction file is entirely dedicated to wiki workflows. The agent excels at knowledge operations, but asking it to generate a Word report or a slide deck falls outside its scope. The agent is a librarian, not an assistant.
The idea.
Two goals, one system.
A general-purpose agent
The wiki becomes a knowledge base, not the entire identity. The agent is free to do anything else โ Word documents, slides, analysis, web research, and more. Skills are modular: add a new workflow without touching the core instructions.
Project separation
Every ingested document belongs to a specific domain. Queries can filter by project without cross-contamination. Your tenth project works exactly like your first โ tags scale naturally.
Three-axis search.
The knowledge base is navigable across three dimensions. The agent classifies each query and applies the right search strategy automatically.
Think of the KB as a sparse matrix โ rows are pages, columns are project tags. Every query triggers a classification step that determines how to slice this matrix.
Project Search
VerticalFilter by tag. Read only pages from that project. Isolated answer, zero contamination.
Column slice โ read only pages in that column
"What is BP59 in project alpha?"
Concept Search
Cross-cuttingRead the concept page that aggregates knowledge from all projects. Answer organized by project, never blended.
Row slice โ read one page across all columns
"What is a PoC?" โ shows how each project uses it
Cross-Project Search
IntersectionCross-reference tags and concepts. Show shared patterns and differences across projects.
Join โ intersect rows and columns, find patterns
"Which projects use RAG? What do they have in common?"
Diagram โ how the three axes slice the knowledge matrix

Smarter search decisions
The agent classifies intent before any retrieval. If the query is operational โ generate a doc, convert a file โ it skips the knowledge base entirely. If uncertain, it checks the index quickly; if nothing matches, it moves on without forcing wiki content into the answer.
The agent searched the wiki for any question. Risk of polluting answers with irrelevant content or blending contexts from different projects.
The agent classifies each query into one of the three axes before searching. It knows when not to search. It filters by tag when needed. It separates contexts in the answer.
Blind domain review.
Before every ingest, the agent validates that the new document actually belongs to the declared project. The validation uses a blind review process โ the two evaluations run independently to avoid anchoring bias.
Parallel evaluation
The agent reads the document and the existing wiki pages for that project, then forms its own semantic judgment โ blind, without seeing any score. Independently, a deterministic script computes the quantitative analysis: TF-IDF cosine similarity, entity overlap, and concept overlap.
Comparison and verdict
The agent sees both evaluations side by side, flags any discrepancy, and produces the final affinity decision. When the two disagree, the discrepancy itself becomes a valuable signal for the user.
Example output โ deterministic analysis

| Metric | Score | Detail |
|---|---|---|
| Lexical similarity (TF-IDF) | 72% | cosine similarity on TF-IDF vectors |
| Entity overlap | 4/7 (57%) | OpenAI, RAG, LangChain, Anthropic |
| Concept overlap | 3/5 (60%) | PoC, Fine-tuning, Embeddings |
| Composite score | 63% | weights: lexical 0.4, entity 0.3, concept 0.3 |
The validation is advisory, never blocking โ the user always has the final word. If the user declines the ingest, the document is tracked with the reason for rejection. If the same document is submitted again later, the agent completes the full blind review first and only then surfaces the previous rejection โ keeping the new evaluation free from bias.
The knowledge graph.
The wiki can be visualized as an interactive knowledge graph. Running build graph extracts all connections between pages โ both explicit wikilinks and implicit relationships inferred by the agent โ and renders them in a self-contained HTML file you can open in any browser. No server needed.
The graph uses Louvain community detection to cluster related pages, and vis.js for interactive visualization with search, filtering, and a detail drawer for each node.
Knowledge graph โ nodes and connections across the wiki

Extracted edges
Deterministic. Parses all [[wikilinks]] across wiki pages to build the explicit connection graph.
Inferred edges
Semantic. The agent reads each page and identifies implicit relationships not captured by explicit links, with confidence scores.
Health report
Orphan nodes, god nodes, fragile bridges between communities, phantom hubs โ with suggested actions to strengthen the graph.
Works with every agent.
Open the repo in any coding agent โ each one picks up its own instruction file automatically. No configuration, no setup. The same knowledge base, the same workflows, regardless of which tool you prefer.
Architecture โ how agents interact with the knowledge base

No server, no database โ everything is plain markdown files. The Python scripts in tools/ can also run standalone from terminal, without a coding agent.
Why this matters.
One entry point, no more session sprawl
You don't need endless parallel sessions with your AI assistant, each with its own fragile context. One interface, one knowledge base that grows with you. You go back to interacting with AI the way it felt at the beginning โ a single, general conversation โ except now it has memory.
But you're not locked in
Nothing stops you from running multiple sessions if you want to. They won't contaminate each other โ project tags keep domains clean. Use one session or ten, the knowledge stays organized either way.
Maintainable and scalable
Skills are modular โ add a new workflow without touching the core instructions. Tags scale naturally โ your tenth project works exactly like your first. The knowledge base grows without degrading, because every piece of information has a domain and a structure.
Works with any AI assistant
The knowledge base is plain markdown with frontmatter and wikilinks โ no proprietary format, no vendor lock-in. The wiki structure works with any AI tool that can read files: Claude Code, Cursor, Windsurf, GitHub Copilot, Gemini CLI, Codex, or any future assistant. The knowledge you build is yours, portable, and readable by humans and machines alike.
Search, transform, deliver
Every task follows the same empowered workflow: the agent searches the knowledge base first, enriches its understanding with what you've already validated, then transforms that into the output you need. The result isn't generic โ it's informed, refined, and grounded in your accumulated knowledge.
The workflow
Research
Search the knowledge base, gather context
โTransform
Enrich with validated knowledge
โResult
Grounded, informed output
How it works.
Knowledge and action, in a loop. Drop documents in, get sharper answers out.
Drop a document
Place any source file in raw/ โ articles, reports, notes, papers.
Ingest
The agent reads it, extracts entities, concepts, and connections. Tags it to your project.
Ask questions
Query in natural language. The agent classifies your intent and searches the right axis.
Get grounded answers
Answers cite your validated knowledge with [[wikilinks]]. No hallucination, no guessing.
Workflow โ the full ingest-to-answer pipeline

