Outbound that
teaches itself.
AutoGTM takes Andrej Karpathy's autoresearch loop and points it at cold outbound. Treat every campaign like a research run. Fix one metric, vary one thing, keep what works.
An agent runs the loop, scores every reply by buying intent, and feeds the winners back into a brain that makes the next campaign sharper. Git-native, client-agnostic, MIT-licensed. One file to edit. No spreadsheet to babysit.
Most outbound optimizes the wrong number
Get the metric right and the whole loop falls into place. Get it wrong and you scale noise.
Teams A/B test subject lines all day. They chase open rates that mailbox providers no longer report honestly, and reply rates that count every "unsubscribe me" and "wrong person" as a data point. They get a number that moves, and a motion that never actually gets smarter.
The fix is one decision: optimize on positive-reply quality, not reply rate. Weight every reply by buying intent. A "send me a proposal" counts for far more than a bare "interested," and infinitely more than an open.
Quality is the signal that compounds.
When the target is buyers instead of responders, every keep/discard decision pushes the campaign toward the messages that actually move money. The brain fills with copy that earned real intent, and the next campaign starts from there, not from scratch.
Why borrow from ML research?
Andrej Karpathy's autoresearch showed that three things in a specific relationship produce compounding improvement: a fixed evaluation metric, a single modifiable artifact, and an autonomous loop. A constant ruler plus attributable changes plus many iterations equals a monotonic climb, even when any single step is a guess.
That pattern is not specific to machine learning. It applies to any domain with a measurable outcome. Outbound is one of them. AutoGTM is the port.
The idea in one table
Choose the metric and the artifact. The rest of Karpathy's pattern transfers unchanged.
Run it long enough and your go-to-market motion stops being a static playbook and becomes a learning machine.
The experiment loop
One falsifiable hypothesis per run. One dimension changed. One ruler that never moves. Keep the winner, reset the loser, and the brain gets a little sharper every cycle.
The eval is the whole game
Every reply is scored into exactly one intent tier. The tiers are deliberately coarse so an agent classifies them reliably and consistently. This rubric never changes between experiments, and that fixity is what lets the agent run the loop and trust its own keep/discard decision.
positive_reply_quality = Σ(reply_weight) / emails_sentTwo campaigns can post identical reply rates while one earns five "book a call" replies and the other five "unsubscribe me." Reply rate can't tell them apart. Quality can. Optimize the score and the campaign drifts toward the messages that produce buyers, not just responders.
Five mechanisms
Three are the framework: the loop and the brain. Two are outputs the loop produces. You don't build the outputs by hand; they fall out of running the system.
The GTM experiment loop
frameworkTreat every campaign as a research run. hypothesis → audience → message → result → keep / discard. A hypothesis is falsifiable: "Treasury leads respond to a cost-savings hook better than a compliance hook." Run it, score it, keep the winner, log both.
Positive replies become training data
frameworkEvery positive reply tells you what the market actually cares about. Cluster T2/T3 replies by intent, write each cluster into the brain as first-party signal, and feed it back to the operating agent. This is the step that makes the system learn instead of merely run.
Immediate sales action
outputThe loop continuously emits a prioritized action queue: who to contact and why, ranked by reply tier and recency. T3 replies go to the top with the exact ask they made. Not a report you write; a live view the agent pushes to Slack or your CRM.
Standard follow-up packet
outputOne ready-to-send packet so no hot reply waits on a human to assemble materials: one-page overview, short deck, the mechanism/pricing explainer people keep asking for, qualification questions, and a calendar CTA, all in the configured language.
The GTM brain
frameworkThe durable, compounding asset. A crystallized-intelligence-compatible knowledge base the agent reads before writing any campaign and updates after scoring every run: winning messages, objections, clustered replies, ICP patterns, competitor mentions, and a running scoreboard.
The brain is the compounding asset
The agent reads it before writing any campaign and updates it after scoring every run. A plain folder of markdown is enough, and it is crystallized-intelligence-compatible if you want to keep it lean as it grows.
winning-messages/Copy that earned T2/T3 replies, with its eval score attachedobjections/Every objection seen + the response that re-engagedpositive-replies/Positive replies clustered by intent (the "training data")icp-patterns/Which segments convert; which signals predict a hot (T3) replycompetitor-mentions/Who prospects compare you to, and what they saycampaign-performance/Running scoreboard, findings, next recommended testsIt holds a company's actual messages, replies, and ICP intelligence. The open-source repo ships the folders empty (with .gitkeep) precisely because the framework holds no company data. Everything client-specific flows through the loop as configuration, never baked into its structure. That boundary is what makes AutoGTM a framework instead of one company's playbook.
Hand it to your agent. It runs the loop.
AutoGTM is built to be run by an autonomous agent deployment: an OpenClaw / Hermes agent provisioned for a client, not a human in a spreadsheet. Open Claude Code, OpenClaw, Cursor, Codex, whatever you run, and paste this.
Clone github.com/bcharleson/autogtm into this folder.
Read program.md end to end, then explain back to me, in your own
words: what AutoGTM optimizes, how the positive-reply-quality eval
works (the T0-T3 rubric), what the single modifiable artifact is,
and the nine steps of the experiment loop.
Then draft the Configuration block at the top of program.md for my
company. Ask me only for the fields you can't infer.- →The human sets strategy and approves the sends.
- →The agent runs the loop, scores replies, updates the brain.
- →It surfaces the next action: who to contact and why.
When the operator is the agent, the brain the loop updates is the same brain the agent reads. So the system improves the operator, not just the output. A human operator breaks that feedback loop; an agent operator closes it.
One file to edit. Everything else is scaffolding.
program.md is the only file you touch. Fill in the Configuration block (the only client-specific values) and the generic loop runs against your company.
git clone https://github.com/bcharleson/autogtm
cd autogtm
# 1. Fill in the Configuration block (the only file you edit):
$EDITOR program.md
# 2. Seed brain/ with anything you already know
# (winning subject lines, known objections)
# 3. Hand program.md to your operating agent and run the loop.company: "{{COMPANY_NAME}}"
one_line: "{{WHAT_THEY_DO_IN_ONE_SENTENCE}}"
icp: "{{WHO_WE_TARGET}}"
core_offer: "{{THE_OFFER_OR_WEDGE}}"
proof_assets: "{{NAMED CLIENTS / DATA / DIFFERENTIATORS}}"
sending_tool: "{{instantly | smartlead | apollo | ...}}"
crm: "{{hubspot | twenty | airtable | ...}}"
language: "{{en | es | pt | ...}}"
agent_operator: "{{DEPLOYED_AGENT_NAME}}"
min_sends_to_score: 100 # never evaluate below this sample sizeLOOP (until manually stopped):
1. Read state git log --oneline -5; tail -20 campaigns/results.tsv
2. Hypothesize change exactly one of {audience, message, cta}
3. Commit git commit -m "experiment: <hypothesis>"
4. Run send through sending_tool (>= min_sends_to_score)
5. Score classify replies T0-T3; compute positive_reply_quality
6. Log append a row to campaigns/results.tsv
7. Decide improved -> keep + promote winning copy to brain/
equal/worse -> git reset --hard HEAD~1 (discard)
8. Update brain cluster replies, append objections, refresh action queue
9. Go to 1Approve the sends. Let it score, log, and learn.
Built to pair with crystallized-intelligence
AutoGTM discovers what converts. Crystallized intelligence compiles it into agent-readable layers. One keeps the loop honest; the other keeps the brain compact.
The brain's format
Compiles expertise into agent-readable layers: seed, principles, knowledge, sources, raw. Structures what you already know.
The brain's feedstock
Generates and scores the GTM experiments that fill those layers. Discovers what actually converts. Every positive reply is fresh first-party signal.
Every positive reply AutoGTM scores becomes first-party signal. Run it through crystallized-intelligence and it compiles into the brain's seed and principles layers, so the operating agent reads a crystallized brain, tight and load-bearing, instead of a raw pile of replies. You don't have to use both. But if you want the brain to stay lean as it grows, point one at the other.
Why we built it
We run outbound for every client of ours, and we kept watching the same thing happen across the industry: campaigns treated as static playbooks. You write the sequence, you launch it, you read a dashboard, and none of it makes the next campaign smarter. The learning evaporates the moment the campaign ends.
Meanwhile we'd been deploying agents for everything else: research, CRM hygiene, reporting. The obvious move was to let an agent run outbound as a loop, not just draft the copy. But an agent only compounds if there's a fixed ruler to judge against and a brain to write back to. That's exactly the shape of Karpathy's autoresearch.
So we ported it. AutoGTM is the loop we wished every outbound motion ran on: agent-operated, quality-weighted, and compounding. It's client-agnostic and MIT-licensed because the mechanism shouldn't be anyone's secret. The only thing private is your brain.
MIT licensed. Built by Brandon Charleson at Top of Funnel.