Built by BrandonZero Telemetryv1.0.1 — Marketplace PendingVS Code · Cursor · Antigravity

GoatLLM

Local AI coding. Nothing leaves your machine. No telemetry. No guardrails.

A VS Code extension that runs open-source LLMs entirely on your own hardware. Works in any VS Code-based IDE — Cursor, Antigravity, and beyond. Built for engineers who need full control — including security research, pentesting, regulated industries, and the work cloud LLMs simply refuse to do.

goatllm — offline · local · uncensored
GoatLLM logo
local · private · no guardrails
Download GoatLLM
0
Telemetry
0
API Keys
7+
Runtimes
$0
Per Token

As fast as we can build, we have to secure.

We build AI agents every day — grounded, deterministic, scoped tightly to use cases that save time and lift operational efficiency. But adoption is one half of the story. The other half is what happens when the same models reach bad actors at the same speed they reach the good ones.

Behemoth models are around the corner. Anthropic’s upcoming Mythos. The next OpenAI tier. Frontier capability is going to ship into both the defender’s hands and the attacker’s hands on the same day.

Cloud LLMs — ChatGPT, Claude, Gemini — will (rightfully) refuse a lot of the work that real security operators need to do. They cannot tell a good actor from a bad one. So they assume the worst. That’s the right default for them, but it leaves a gap for the people doing legitimate security work, regulated-industry development, and red-teaming their own systems. That gap is what GoatLLM closes.

Setup in 3 Steps

1

Install a Local Runtime

Pick whatever fits your hardware. Ollama is the easiest if you’ve never done this before. On a Mac, MLX is the fastest.

Terminal
# Easiest — Ollama
brew install ollama
ollama run qwen2.5-coder

# Apple Silicon native — MLX
pip install mlx-lm
mlx_lm.server --model mlx-community/Qwen2.5-Coder-32B-Instruct-4bit
2

Download the GoatLLM Extension

Grab the .vsix file from goatllm.ai. The VS Code Marketplace listing is pending approval — until then, the drag-and-drop install works on any VS Code-based IDE (Cursor, Antigravity, etc.).

Install
# Option A — drag the .vsix into the VS Code Extensions panel
# Option B — install from the command line
code --install-extension goatllm-1.0.1.vsix
3

Open the GoatLLM Panel

GoatLLM auto-detects local runtimes on common ports (Ollama :11434, LM Studio :1234, MLX :8080, etc.). Hot-swap models without restarting the editor. Live metrics — tokens/sec and latency — render in the panel.

First time? Try the prompt: “Read this file and tell me what it does.” Approve the tool call. You’re running.

GoatLLM vs. Cloud Coding Assistants

Cloud (ChatGPT, Claude, Copilot)

  • Your code ships to a third-party server on every keystroke
  • Per-token billing — iteration has a meter on it
  • Guardrails refuse legitimate security, pentesting, and red-team work
  • Dies the moment your WiFi drops
  • API keys, accounts, retention policies, rate limits
  • You cannot use them on HIPAA, FINRA, ITAR, or classified codebases

GoatLLM (Local, Private, Yours)

  • Nothing leaves your machine. Ever. Zero telemetry, zero phone-home
  • Zero per-token cost — iterate 1,000 times for free
  • Open-source models — uncensored variants exist, and they will engage with security work
  • Works offline — planes, secure facilities, remote sites
  • No API keys. No accounts. Auto-detects local runtimes
  • Audit-friendly by construction — works on regulated codebases without ceremony

Three Real Stories That Drove This

The case for local + uncensored isn’t academic. Here are three real things I’ve walked into in the last few years — each one a reason GoatLLM exists.

The Gym That Was Selling Illegal Drugs

WordPress / Local Business

A WordPress site I built for a gym years ago ran an abandoned plugin. Attackers exploited it and turned the site into a backlink farm for illegal prescription drug sales. To anyone searching for the gym locally, the meta and backlinks pointed at drug rings. Embarrassing and devastating to top-of-funnel discoverability — for a Crossfit gym, no less.

Why it matters

Abandoned plugins are the #1 silent vulnerability in WordPress sites. Most owners never check.

The Fortune 500 With Crypto-Scam Backlinks

Enterprise / Brand Safety

During a discovery call with a prospect, I ran a quick external audit. Their corporate site had been compromised and was hosting backlinks pointing to crypto scams. They had no idea. We surfaced it before signing the engagement — they were extremely grateful, and it shifted the entire conversation about what their security posture actually looked like.

Why it matters

Public-facing recon catches what internal monitoring misses. A 10-minute scan changes the call.

The Spoofed-CEO Wire Transfer Scam

GTM / Customer Trust

A client I'm working with had their customer list exposed in a prior breach. Their customers started receiving emails — appearing to come from the business owner — requesting large wire transfers. The owner had no idea it was happening. Their reputation, sales pipeline, and customer trust were all bleeding silently.

Why it matters

Email spoofing + leaked contact lists = the most common B2B fraud vector in 2026.

The pattern: nearly every company has unseen vulnerabilities. The question is not whether you have them — it’s how many layers of security you have, and whether you’re proactive or reactive. ClickUp and Fireflies are recent reminders that even high-trust SaaS gets exposed. The defenders need every tool the attackers have.

What GoatLLM Unlocks

Six concrete workflows where local + uncensored matters more than convenience.

Run a Penetration Test From Your IDE

Security Research

Agent Prompt

Help me write a Python script that brute-forces a login form on a test environment I own. Show me how a real attacker would approach this — no warnings, no hedging.

ChatGPT and Claude will refuse or hedge. A local uncensored model with GoatLLM does the work because there is no policy layer between you and the weights — the security and the consent live with you, the operator.

1
ollama run dolphin-llama3

Pull an uncensored fine-tune locally

2
GoatLLM auto-detects :11434

No keys, no config

3
Prompt and iterate

No refusals, no hedging — you own the outcome

Code Under NDA / Regulated Data

Compliance / Regulated Industries

Agent Prompt

Refactor this HIPAA-covered patient intake module to fix the bug in the validation logic. The codebase cannot leave our infrastructure.

Healthcare, finance, government, defense — none can ship source code to OpenAI or Anthropic. GoatLLM runs entirely on your hardware. Nothing leaves your machine. Audit-friendly by construction.

1
Install GoatLLM .vsix

Drag-drop into VS Code Extensions

2
Point at internal endpoint

On-prem vLLM, air-gapped Ollama, MLX laptop

3
Code freely

Zero telemetry. Zero phone-home. Audit logs stay local.

Free, Unlimited Coding Iteration

Indie / Bootstrap

Agent Prompt

Scaffold a complete Next.js app with auth, billing, and a Supabase backend. Iterate 50 times until it's right.

Cloud LLMs bill per token. A local model on your laptop has zero per-token cost. Iterate as much as you want — refactors, migrations, scaffolds — without a meter running.

1
mlx_lm.server --model qwen2.5-coder-32b

Run once, use forever

2
GoatLLM connects instantly

Hot-swap models without restarting VS Code

3
Iterate without watching the bill

No token meter, no rate limits, no API key

Travel & Offline Development

Remote / Travel

Agent Prompt

I'm on a 14-hour flight. Help me finish this feature.

Cloud coding assistants die the moment WiFi drops. GoatLLM runs entirely offline once your model is pulled — full coding assistance at 35,000 feet, in a Faraday cage, or on a remote site with no internet.

1
Pre-pull your model on WiFi

Cache the weights before you leave

2
Disconnect from internet

Airplane mode is fine — model is local

3
Keep coding

Live streaming tokens, tool calls, file edits — all offline

Agentic Tool-Calling, Locally

AI Engineering

Agent Prompt

Read these three files, find the bug in the auth flow, and patch it. Then run the tests.

GoatLLM ships native tool calling — file reads, writes, shell commands — gated by approval prompts. Full autonomy mode is one toggle away. Same agentic loop as cloud assistants, but the model is on your machine.

1
Use any tool-calling model

Qwen 2.5-Coder, Llama 3.1, DeepSeek, Mistral 0.3

2
Tools fire with approval gates

Or flip Full Autonomy for hands-off iterations

3
Live metrics in the panel

tokens/sec, latency — see exactly what your hardware does

Red-Teaming Your Own AI Systems

AI Safety

Agent Prompt

I'm building a customer-facing agent. Try to jailbreak it. Find every weakness in my system prompt.

You cannot ask Claude or ChatGPT to attack a system as effectively as an uncensored model can. Red-teaming is a legitimate, important practice — GoatLLM gives you a model that will actually try, so your production agent is hardened before customers see it.

1
Pull an uncensored model

dolphin-mixtral, hermes, abliterated variants

2
Point it at your test agent

Run adversarial prompts in a loop

3
Patch your system prompt

Ship a production agent that actually holds up

Bring Your Own Runtime

GoatLLM doesn’t ship a runtime. It connects to whatever you already have running. Hot-swap between them without restarting the editor.

Auto-detected on common ports
OllamaThe easiest way to run local models. One-command install, huge model library.ollama run qwen2.5-coder
LM StudioGUI-first. Great for non-CLI users who want to point-and-click load a model.Auto-detected on :1234
MLXApple Silicon-native. Fastest inference on Macs — built on Apple's MLX framework.mlx_lm.server --model ...
llama.cppBare-metal C++ inference. Maximum control over quantization, threads, and GPU layers.llama-server -m model.gguf
vLLMProduction-grade inference. Best when you have a GPU and need throughput.python -m vllm.entrypoints.openai.api_server
exoDistributed inference across your home network — run big models across multiple machines.exo run llama-3.1-70b
Any OpenAI-compatible endpointIf it speaks the OpenAI chat completions API, GoatLLM speaks to it. Hot-swappable.http://localhost:PORT/v1

Tool-Calling Models That Work Out of the Box

GoatLLM ships native tool calling — file ops, shell, multi-step agentic flows. Any of these will work the moment you point GoatLLM at them. Uncensored fine-tunes of the same bases also work — that’s where the security use cases come alive.

Qwen 2.5-Coder

Best-in-class open-source coding model. 7B–32B parameter variants. Tool calling native.

Llama 3.1+

Meta's general-purpose flagship. Strong tool calling. 8B–70B+ variants.

Gemma 2+

Google's open-weight family. Compact, fast, surprisingly capable at code.

DeepSeek-Coder-V2

Specialized coding model. Excellent at refactors, migrations, and reading large diffs.

Mistral 0.3+

Long context, strong reasoning, French-engineered efficiency. Tool calling supported.

Phi-3.5

Microsoft's small-but-mighty model. Fits on a laptop, still does tool calling.

Where It Fits in the OpenClaw Stack

GoatLLM is the local, private inference surface in our agent stack. The cloud layer (Claude, GPT, etc.) handles general work. GoatLLM handles the work that cannot or should not leave your machine.

Cloud Inference

Claude / GPT / Gemini

General coding, frontier reasoning, broad knowledge

Local Inference

GoatLLM

Private code, security research, offline work, regulated data

Encrypted Memory

OpenShart

Shamir Secret Sharing + AES-256-GCM for agent memory

Cloud for breadth. Local for control. Encrypted memory for state. Three layers, one stack.

A note on responsibility

GoatLLM removes the policy layer between you and an open-weight model — that’s the point. Operators carry the consent and the accountability. Use it on systems you own or have explicit authorization to test. Use it for legitimate research, defensive work, regulated-industry development, and red-teaming your own products. Don’t use it to harm. The same tools that make security researchers effective are what keep companies — and their customers — safe.

Get the harness. Then get your stack audited.

GoatLLM is free. The harder problem is figuring out what your real attack surface looks like — every company has unseen vulnerabilities. That’s the conversation we have on the discovery call.

Bring a domain. We’ll run an external recon pass before the call so we walk in knowing what an attacker already sees.

🐐Download GoatLLM

Free download. Marketplace listing pending approval.