Open Source·Agent Knowledge Architecture

Skills are how.
Brains are what they know.

Most agents drown in skill files and re-derive expertise on every turn. Crystallized Intelligence compiles a domain into five resolution layers — agents load only what the task needs.

A ~200-token seed up front. Principles, knowledge, and full sources behind it. Git-native, plaintext, zero pip dependencies, MIT-licensed. No vector DB. No infra to babysit. One git clone and you are running in 60 seconds.

5
Resolution Layers
~200
Token Seed
0
Pip Dependencies
60s
To Demo
Before the technical bit

How your brain already works

You already use this system every day. You just have not named it.

Think about the last great book you read. The one that actually changed how you see something.

You do not remember every page. You do not even remember most of it. What you remember is the one idea — the core principle that crystallized in your mind. The single sentence you would tell a friend at dinner.

Below that you remember the chapter structure. The big frameworks. The arguments the author was making.

Below that you remember a few vivid stories — the case studies, the examples that stuck.

The full text of the book? It is still on your shelf. You can crack it open if someone says "wait, what did they say about X?" But you do not carry the whole book in your head every day. You would be paralyzed.

That is how human expertise actually works.

Compressed at the top. Source material at the bottom. Loaded on demand. The expert and the novice have read the same book — the expert just keeps the compressed version up front and the full text in reach.

Here is the part most people miss.

When you read a great book, it does not override your lived experience. When you scroll a hot take on social media, it does not override the great book. There is a hierarchy in there, even if nobody draws it on a whiteboard.

Your own experience sits at the top. Authoritative sources — textbooks, the doctor, the engineer who has shipped the thing fifty times — sit below. A loud tweet sits near the floor. A random YouTube comment is barely registering.

That is not a bug. It is how a healthy mind works. Without that ranking, every loud voice would reshape your beliefs, and you would never get anything done.

Agents have no such ranking — until you build one.

Out of the box, an agent treats your in-house playbook and a random Reddit comment as the same kind of text. That is fine for chatbots. It is dangerous when the agent is writing a proposal, qualifying a lead, or touching production data. Crystallized Intelligence is that ranking — written down, version-controlled, and pinned in front of the model on every turn.

The distinction nobody is drawing yet

Anthropic shipped Skills. Cursor shipped Rules. AGENTS.md is a spec. They all answer the same question: how should an agent behave. None of them answer: what does it know.

Skill

How to do something

A procedure. Steps, tools, expected output shape. "Run the pour-over checklist, format the output as JSON."

  • Tells the agent what to do
  • Usually short and procedural
  • Bloats fast when you cram domain knowledge into it
Brain

What the agent knows

Compressed domain expertise. Heuristics, frameworks, trade-offs, source-tiered references. "Extraction theory. When grind matters more than ratio. The mistakes a novice always makes."

  • Tells the agent what is true and why
  • Pre-compiled across five layers
  • Loaded at the depth the task demands — not all at once

Stop bloating skills with domain knowledge. Keep skills thin — procedure, tools, output shape. Put mastery in the Brain. When a single agent has dozens of skills and each one is carrying its own mini-encyclopedia, you hit rate limits before the agent has even started thinking.

Five resolution layers

This mirrors how human expertise actually works. Novices read the source material. Practitioners operate from frameworks. Experts think from compressed heuristics — and only drill down when something is off.

A Brain · Five Layers
Compressed at the top. Raw at the bottom. Loaded on demand.
always loadedon demand
L4seed
~200 tok
L3principles
~2K tok
L2knowledge
~20K tok
L1sources
~200K tok
L0raw
unlimited
~200
tokens loaded by default
brain expand
pulls only what the query needs
git diff
every byte the agent read is auditable
Layer 4
seed
~200 tokens
Domain DNA
Core heuristics and compressed first principles. The single paragraph an expert would tattoo on the inside of their eyelids. Loaded on every turn.
Layer 3
principles
~2,000 tokens
Frameworks & decision trees
The mental models, common mistakes, and "if X then Y" heuristics that turn knowledge into judgment. Loaded when a task needs reasoning, not just a fact.
Layer 2
knowledge
~20,000 tokens
Curated guides & references
Tier-stamped articles, runbooks, and case studies. Pulled in by query, ranked by source trust. The "go look it up" layer — but pre-filtered for quality.
Layer 1
sources
~200K tokens
Transcripts, books, raw evidence
The full corpus. Books, podcasts, talks, original docs. Only loaded when the agent has exhausted higher layers and needs to read from the source.
Layer 0
raw
unlimited
Unprocessed source material
Anything you have not crystallized yet. The future content waiting to be classified, tiered, and lifted into the upper layers.
The token budget is the point

brain bootstrap loads the seed. brain expand --query "X" pulls in just the principles and knowledge ranked highest for that query, capped by --max-tokens. You get adaptive context depth without re-architecting your agent.

Trust is first-class

Every document carries a source.type in its frontmatter. The framework maps that to a tier (0–5), defaults a weight, and lets author_authority: owner bump your own notes one tier higher. A YouTube transcript and your in-house playbook never compete on the same footing.

Trust Ranking · Tier 0 → 5
Your lived experience is the anchor. A tweet is not.
anchorranked & weighted
0
first-partyanchor
Your playbooks, lived experience, original frameworks
1.00
1
canonical external
Books, academic papers, official docs, specifications
1.00
2
professional secondary
Conference talks, workshops, whitepapers, technical reports
0.85
3
practitioner writeups
Tutorials, courses, technical blogs, case studies
0.70
4
long-form social
Podcasts, interviews, YouTube videos, newsletters
0.55
5
short-form social
Tweets, forum posts, Reddit comments
0.35
Authority boostauthor_authority: owner bumps the doc up one tier

Your YouTube transcript would normally land at tier 4. But if it is your channel, marked author_authority: owner, it gets promoted up one tier — to the same trust level as a book or official doc. Provenance always wins over format.

TierClassIncludesWeight
0First-partyYour playbooks, frameworks, lived experience, proprietary insight1.0
1Canonical externalBooks, academic papers, official docs, specifications1.0
2Professional secondaryConference talks, workshops, whitepapers, technical reports0.85
3Practitioner writeupsTutorials, courses, technical blogs, case studies0.70
4Long-form socialPodcasts, interviews, YouTube videos, newsletters0.55
5Short-form socialTweets, forum posts, Reddit comments0.35

Rule of thumb: if a document would change how an agent behaves, its provenance must be obvious from the frontmatter alone — never inferred from the prose.

The easy path

Tell your agent to install it. Watch it work.

You do not have to install anything yourself. Open Claude Code, OpenClaw, Hermes, Cursor, Codex — whatever agent you run — and paste this prompt.

paste this into your agent
Clone github.com/bcharleson/crystallized-intelligence into this folder
and run the 60-second demo:

  python tools/bin/brain.py try

Then show me what a "brain" looks like — list the files it created,
and explain the difference between the seed, principles, and knowledge
layers using the bundled specialty-coffee domain as the example.
What happens next
  • 1.Your agent clones the repo into the folder you are in.
  • 2.It runs the 60-second demo. No pip install, no API key.
  • 3.It bootstraps a 365-token "seed" of specialty-coffee expertise.
  • 4.It expands into the principles layer when you ask about "grind".
  • 5.You see, on your own screen, the layers we are talking about.
Why this matters

The whole framework is plaintext and git-native. There is nothing for your agent to misconfigure, nothing to break, no secrets to leak. The worst case is a folder you can delete.

This is also the honest way to evaluate it. See the layers. Read the seed. Decide if it fits your stack. Then point BRAIN_ROOT at your own brain and the same pipeline runs against your content.

See for yourself.

Or do it yourself

Three commands. No API key.

The repo ships with a sanitized demo brain (specialty coffee, B2B discovery, and a runbook domain). Python 3.9+ stdlib only — nothing to install with pip.

1. Clone & run the 60-second demo
bash
git clone https://github.com/bcharleson/crystallized-intelligence
cd crystallized-intelligence
python tools/bin/brain.py try
2. Explore the bundled demo brain
bash
export BRAIN_ROOT=examples/demo-brain

python tools/bin/brain.py bootstrap b2b-discovery
python tools/bin/brain.py expand b2b-discovery --query "qualify" --max-tier 3
python tools/bin/brain.py bootstrap specialty-coffee
python tools/bin/brain.py bootstrap runbook-basics
3. Initialize and crystallize your own brain
bash
python tools/bin/brain.py init \
  --path ~/my-brain \
  --name "My Brain" \
  --domains "my-domain"

# Drop files into ~/my-brain/corpus/my-domain/knowledge/
# Then crystallize (no API key required):
python tools/bin/crystallize.py \
  --brain-root ~/my-brain \
  --domain my-domain \
  --local
Document frontmatter — provenance lives next to content
example.md
---
title: "Your Document Title"
content_type: knowledge
crystal_layer: 2
domain: my-domain
quality:
  source_tier: 1
  verification: self_verified
  confidence: high
source:
  type: first_party
  author: "Your Name"
  author_authority: owner
freshness:
  sensitivity: medium
  valid_until: 2026-12-31
---

Your content here.

How this compares — honestly

Hierarchical retrieval is not a brand-new idea — RAPTOR, GraphRAG, and MemGPT all live in the same neighborhood. The wedge here is what is missing from those: a plaintext, git-native, opinionated-about-trust framework you can ship without a vector DB.

Naive RAG

Embed everything, retrieve top-k chunks each query.

The limit: Chunks lose context and hierarchy. Trust is not first-class. Every query re-interprets raw text from scratch.

Knowledge graphs (GraphRAG and similar)

Build a graph of entities and relationships, then traverse.

The limit: Great for relationships, weak for explanatory depth. Expensive to build, hard to keep fresh, hard to inspect by hand.

Hierarchical RAG (RAPTOR and similar)

Cluster and summarize recursively into a tree, retrieve at the right altitude.

The limit: The right idea, but still runtime-driven and vector-DB-shaped. Provenance and trust are not first-class. Hard to audit what the agent actually read.

Agent memory tiers (MemGPT, Letta and similar)

Tiered short / long-term memory that the agent swaps in and out.

The limit: Optimized for an agent's own evolving memory, not for compiling a domain a team already knows by heart.

Full-text search

Index every document, search by keyword.

The limit: Returns documents, not judgment. Favors exact wording over conceptual relevance.

Fine-tuning

Bake the knowledge into the model weights.

The limit: Powerful, but slow to update, expensive to inspect, and a poor fit for operational knowledge that changes weekly.

The wedge

Crystallized Intelligence is not trying to replace vector RAG for the universe of all documents. It is the right shape when (a) the domain is narrow enough that an expert exists, (b) you want token budgets to be predictable, and (c) you want every byte the agent reads to be auditable in a git diff. For everything else, keep your vector store.

Why we built it

We deploy multiple agents for every client of ours — research, outbound, CRM hygiene, agreement drafting, reporting. The pattern keeps repeating: a single agent ends up loaded with a dozen skills, each one carrying its own buried mini-encyclopedia, and within a few turns the rate limits start chirping.

We switched OpenClaw to Hermes-style multi-agent setups. That helped — but the underlying problem stayed: skills tell agents how to do something, not what they know about a domain. Every turn was a re-derivation of expertise from raw files.

Crystallized Intelligence is the layer above skills. It is how we (as humans) actually work: we have a source of information, and we seed our knowledge down to compressed heuristics — only diving into the raw context when we need to. It is simple by design. The orchestration is what keeps agents focused.