GoatLLM
Local AI coding. Nothing leaves your machine. No telemetry. No guardrails.
A VS Code extension that runs open-source LLMs entirely on your own hardware. Works in any VS Code-based IDE — Cursor, Antigravity, and beyond. Built for engineers who need full control — including security research, pentesting, regulated industries, and the work cloud LLMs simply refuse to do.
As fast as we can build, we have to secure.
We build AI agents every day — grounded, deterministic, scoped tightly to use cases that save time and lift operational efficiency. But adoption is one half of the story. The other half is what happens when the same models reach bad actors at the same speed they reach the good ones.
Behemoth models are around the corner. Anthropic’s upcoming Mythos. The next OpenAI tier. Frontier capability is going to ship into both the defender’s hands and the attacker’s hands on the same day.
Cloud LLMs — ChatGPT, Claude, Gemini — will (rightfully) refuse a lot of the work that real security operators need to do. They cannot tell a good actor from a bad one. So they assume the worst. That’s the right default for them, but it leaves a gap for the people doing legitimate security work, regulated-industry development, and red-teaming their own systems. That gap is what GoatLLM closes.
Setup in 3 Steps
Install a Local Runtime
Pick whatever fits your hardware. Ollama is the easiest if you’ve never done this before. On a Mac, MLX is the fastest.
# Easiest — Ollama
brew install ollama
ollama run qwen2.5-coder
# Apple Silicon native — MLX
pip install mlx-lm
mlx_lm.server --model mlx-community/Qwen2.5-Coder-32B-Instruct-4bitDownload the GoatLLM Extension
Grab the .vsix file from goatllm.ai. The VS Code Marketplace listing is pending approval — until then, the drag-and-drop install works on any VS Code-based IDE (Cursor, Antigravity, etc.).
# Option A — drag the .vsix into the VS Code Extensions panel
# Option B — install from the command line
code --install-extension goatllm-1.0.1.vsixOpen the GoatLLM Panel
GoatLLM auto-detects local runtimes on common ports (Ollama :11434, LM Studio :1234, MLX :8080, etc.). Hot-swap models without restarting the editor. Live metrics — tokens/sec and latency — render in the panel.
First time? Try the prompt: “Read this file and tell me what it does.” Approve the tool call. You’re running.
GoatLLM vs. Cloud Coding Assistants
Cloud (ChatGPT, Claude, Copilot)
- Your code ships to a third-party server on every keystroke
- Per-token billing — iteration has a meter on it
- Guardrails refuse legitimate security, pentesting, and red-team work
- Dies the moment your WiFi drops
- API keys, accounts, retention policies, rate limits
- You cannot use them on HIPAA, FINRA, ITAR, or classified codebases
GoatLLM (Local, Private, Yours)
- Nothing leaves your machine. Ever. Zero telemetry, zero phone-home
- Zero per-token cost — iterate 1,000 times for free
- Open-source models — uncensored variants exist, and they will engage with security work
- Works offline — planes, secure facilities, remote sites
- No API keys. No accounts. Auto-detects local runtimes
- Audit-friendly by construction — works on regulated codebases without ceremony
Three Real Stories That Drove This
The case for local + uncensored isn’t academic. Here are three real things I’ve walked into in the last few years — each one a reason GoatLLM exists.
The Gym That Was Selling Illegal Drugs
WordPress / Local BusinessA WordPress site I built for a gym years ago ran an abandoned plugin. Attackers exploited it and turned the site into a backlink farm for illegal prescription drug sales. To anyone searching for the gym locally, the meta and backlinks pointed at drug rings. Embarrassing and devastating to top-of-funnel discoverability — for a Crossfit gym, no less.
Why it matters
Abandoned plugins are the #1 silent vulnerability in WordPress sites. Most owners never check.
The Fortune 500 With Crypto-Scam Backlinks
Enterprise / Brand SafetyDuring a discovery call with a prospect, I ran a quick external audit. Their corporate site had been compromised and was hosting backlinks pointing to crypto scams. They had no idea. We surfaced it before signing the engagement — they were extremely grateful, and it shifted the entire conversation about what their security posture actually looked like.
Why it matters
Public-facing recon catches what internal monitoring misses. A 10-minute scan changes the call.
The Spoofed-CEO Wire Transfer Scam
GTM / Customer TrustA client I'm working with had their customer list exposed in a prior breach. Their customers started receiving emails — appearing to come from the business owner — requesting large wire transfers. The owner had no idea it was happening. Their reputation, sales pipeline, and customer trust were all bleeding silently.
Why it matters
Email spoofing + leaked contact lists = the most common B2B fraud vector in 2026.
The pattern: nearly every company has unseen vulnerabilities. The question is not whether you have them — it’s how many layers of security you have, and whether you’re proactive or reactive. ClickUp and Fireflies are recent reminders that even high-trust SaaS gets exposed. The defenders need every tool the attackers have.
What GoatLLM Unlocks
Six concrete workflows where local + uncensored matters more than convenience.
Run a Penetration Test From Your IDE
Security ResearchAgent Prompt
“Help me write a Python script that brute-forces a login form on a test environment I own. Show me how a real attacker would approach this — no warnings, no hedging.”
ChatGPT and Claude will refuse or hedge. A local uncensored model with GoatLLM does the work because there is no policy layer between you and the weights — the security and the consent live with you, the operator.
ollama run dolphin-llama3Pull an uncensored fine-tune locally
GoatLLM auto-detects :11434No keys, no config
Prompt and iterateNo refusals, no hedging — you own the outcome
Code Under NDA / Regulated Data
Compliance / Regulated IndustriesAgent Prompt
“Refactor this HIPAA-covered patient intake module to fix the bug in the validation logic. The codebase cannot leave our infrastructure.”
Healthcare, finance, government, defense — none can ship source code to OpenAI or Anthropic. GoatLLM runs entirely on your hardware. Nothing leaves your machine. Audit-friendly by construction.
Install GoatLLM .vsixDrag-drop into VS Code Extensions
Point at internal endpointOn-prem vLLM, air-gapped Ollama, MLX laptop
Code freelyZero telemetry. Zero phone-home. Audit logs stay local.
Free, Unlimited Coding Iteration
Indie / BootstrapAgent Prompt
“Scaffold a complete Next.js app with auth, billing, and a Supabase backend. Iterate 50 times until it's right.”
Cloud LLMs bill per token. A local model on your laptop has zero per-token cost. Iterate as much as you want — refactors, migrations, scaffolds — without a meter running.
mlx_lm.server --model qwen2.5-coder-32bRun once, use forever
GoatLLM connects instantlyHot-swap models without restarting VS Code
Iterate without watching the billNo token meter, no rate limits, no API key
Travel & Offline Development
Remote / TravelAgent Prompt
“I'm on a 14-hour flight. Help me finish this feature.”
Cloud coding assistants die the moment WiFi drops. GoatLLM runs entirely offline once your model is pulled — full coding assistance at 35,000 feet, in a Faraday cage, or on a remote site with no internet.
Pre-pull your model on WiFiCache the weights before you leave
Disconnect from internetAirplane mode is fine — model is local
Keep codingLive streaming tokens, tool calls, file edits — all offline
Agentic Tool-Calling, Locally
AI EngineeringAgent Prompt
“Read these three files, find the bug in the auth flow, and patch it. Then run the tests.”
GoatLLM ships native tool calling — file reads, writes, shell commands — gated by approval prompts. Full autonomy mode is one toggle away. Same agentic loop as cloud assistants, but the model is on your machine.
Use any tool-calling modelQwen 2.5-Coder, Llama 3.1, DeepSeek, Mistral 0.3
Tools fire with approval gatesOr flip Full Autonomy for hands-off iterations
Live metrics in the paneltokens/sec, latency — see exactly what your hardware does
Red-Teaming Your Own AI Systems
AI SafetyAgent Prompt
“I'm building a customer-facing agent. Try to jailbreak it. Find every weakness in my system prompt.”
You cannot ask Claude or ChatGPT to attack a system as effectively as an uncensored model can. Red-teaming is a legitimate, important practice — GoatLLM gives you a model that will actually try, so your production agent is hardened before customers see it.
Pull an uncensored modeldolphin-mixtral, hermes, abliterated variants
Point it at your test agentRun adversarial prompts in a loop
Patch your system promptShip a production agent that actually holds up
Bring Your Own Runtime
GoatLLM doesn’t ship a runtime. It connects to whatever you already have running. Hot-swap between them without restarting the editor.
OllamaThe easiest way to run local models. One-command install, huge model library.ollama run qwen2.5-coderLM StudioGUI-first. Great for non-CLI users who want to point-and-click load a model.Auto-detected on :1234MLXApple Silicon-native. Fastest inference on Macs — built on Apple's MLX framework.mlx_lm.server --model ...llama.cppBare-metal C++ inference. Maximum control over quantization, threads, and GPU layers.llama-server -m model.ggufvLLMProduction-grade inference. Best when you have a GPU and need throughput.python -m vllm.entrypoints.openai.api_serverexoDistributed inference across your home network — run big models across multiple machines.exo run llama-3.1-70bAny OpenAI-compatible endpointIf it speaks the OpenAI chat completions API, GoatLLM speaks to it. Hot-swappable.http://localhost:PORT/v1Tool-Calling Models That Work Out of the Box
GoatLLM ships native tool calling — file ops, shell, multi-step agentic flows. Any of these will work the moment you point GoatLLM at them. Uncensored fine-tunes of the same bases also work — that’s where the security use cases come alive.
Qwen 2.5-Coder
Best-in-class open-source coding model. 7B–32B parameter variants. Tool calling native.
Llama 3.1+
Meta's general-purpose flagship. Strong tool calling. 8B–70B+ variants.
Gemma 2+
Google's open-weight family. Compact, fast, surprisingly capable at code.
DeepSeek-Coder-V2
Specialized coding model. Excellent at refactors, migrations, and reading large diffs.
Mistral 0.3+
Long context, strong reasoning, French-engineered efficiency. Tool calling supported.
Phi-3.5
Microsoft's small-but-mighty model. Fits on a laptop, still does tool calling.
Where It Fits in the OpenClaw Stack
GoatLLM is the local, private inference surface in our agent stack. The cloud layer (Claude, GPT, etc.) handles general work. GoatLLM handles the work that cannot or should not leave your machine.
Cloud Inference
Claude / GPT / Gemini
General coding, frontier reasoning, broad knowledge
Local Inference
GoatLLM
Private code, security research, offline work, regulated data
Encrypted Memory
OpenShart
Shamir Secret Sharing + AES-256-GCM for agent memory
Cloud for breadth. Local for control. Encrypted memory for state. Three layers, one stack.
A note on responsibility
GoatLLM removes the policy layer between you and an open-weight model — that’s the point. Operators carry the consent and the accountability. Use it on systems you own or have explicit authorization to test. Use it for legitimate research, defensive work, regulated-industry development, and red-teaming your own products. Don’t use it to harm. The same tools that make security researchers effective are what keep companies — and their customers — safe.
Get the harness. Then get your stack audited.
GoatLLM is free. The harder problem is figuring out what your real attack surface looks like — every company has unseen vulnerabilities. That’s the conversation we have on the discovery call.
Bring a domain. We’ll run an external recon pass before the call so we walk in knowing what an attacker already sees.
Free download. Marketplace listing pending approval.