Developer Guide·Web Scraping Infrastructure

Never Lose a Scraping Job Again

The Beginner's Guide to tmux for Web Scraping & AI Agents

Protect your scrapers from terminal crashes, SSH disconnects, and accidental closures. Essential for running long jobs on remote servers—start the scraper, disconnect, check back hours later.

brew install tmux
Jump to Quick Reference

⚠️ Important: Where Your Scraper Runs Matters

tmux cannot keep your laptop awake or maintain your internet connection. Here's what actually works:

✅ Remote Server (Best)

Rent a cloud server ($5-10/month). SSH in, start tmux, run your scraper. The server stays on 24/7. Disconnect your laptop, the server keeps working.

⚠️ Local Laptop (Limited)

Laptop must stay on and connected to Wi-Fi. tmux only protects against: terminal crashes, accidentally closing the window, or losing SSH to localhost.

💡 This guide focuses on remote servers (the primary use case). For local laptops, keep your machine plugged in and prevent sleep mode.

The Problem: Your Scraper Stops When You Do

When you run a web scraper (or an AI agent doing research), it often takes hours or days to finish. Without tmux, your script is tied to your terminal window. If any of these happen, everything stops:

Your laptop goes to sleep
Your terminal app crashes
Your SSH connection drops
You accidentally close the tab
macOS force-quits Terminal during an update

You lose all progress. For a 19-hour scrape, that's devastating. You're back to square one.

✨ tmux solves this problem completely

It creates a "background session" that keeps running no matter what happens to your laptop. Start your scraper, close your laptop, come back tomorrow—it's still running.

⚡ The Magic of tmux ⚡

Watch what happens when you run a scraper on a remote server via SSH...

😱 WITHOUT TMUX
1
🚀

You Start the Scraper

10:00 AM

$ ssh myserver.com

$ python scraper.py

✓ Scraping 10,000 pages...

2

You Close Your Laptop

11:30 AM - Going to lunch

SSH connection drops...
💀
💥

PROCESS KILLED

11:30 AM - The exact moment

python scraper.py

⚠️ SSH disconnect killed the scraper

Progress lost: 1,847/10,000 (18%)

😭

All Progress Lost

Start over from zero

🎯 WITH TMUX
1
🚀

You Start the Scraper

10:00 AM

$ ssh myserver.com

$ tmux new -s scraper

$ python scraper.py

✓ Scraping 10,000 pages...

🛡️Protected by tmux
2

You Close Your Laptop

11:30 AM - Going to lunch

SSH connection drops... No problem!
🎉

STILL RUNNING!

Server working 24/7

python scraper.py

Progress:7,843 / 10,000 (78%)

🖥️ Running on remote server...

🏆

Job Completes!

SSH back later to grab results

That's the Power of tmux! 🚀

The remote server runs 24/7 with constant internet. tmux keeps your scraper protected from SSH disconnects.Close your laptop. Go to lunch. Go to bed. The server keeps working. ⚡

How tmux Works (Simple Explanation)

Think of tmux as a background worker that keeps your scripts running even when you're not looking. Your terminal window is just a "window" into what's happening—closing the window doesn't stop the work.

❌ Without tmux

Your script runs inside the window. Close the window = script dies.

💡 Like a live phone call. Hang up and it's over.

✅ With tmux

Your script runs in the background. The window is just a view into it.

💡 Like a voicemail recording. Keeps running without you.

screen — The Old Reliable

Since 1987. Simple, battle-tested, and does one thing well: keep your processes alive.

Basic Workflow

screen workflow
# 1. Create a named session
screen -S my-scrape

# 2. You're now "inside" the session — run your scraper
python scraper.py --resume

# 3. DETACH from the session (script keeps running)
#    Press: Ctrl+A, then D
#    You'll see: [detached from 12345.my-scrape]

# 4. Go to bed. Close your laptop. Whatever.

# 5. Come back later and REATTACH
screen -r my-scrape

# 6. Your script is still running, output scrolling by

Essential Commands

ActionCommand
Create a named sessionscreen -S session-name
Detach (leave running)Ctrl+A, then D
Reattach to a sessionscreen -r session-name
List all sessionsscreen -ls
Kill a sessionscreen -X -S session-name quit
Scroll up (view output)Ctrl+A, then [ (q to exit)

Installation

macOS

Pre-installed

Just type screen

Ubuntu/Debian

sudo apt install screen

CentOS/RHEL

sudo yum install screen

tmux — The Modern Upgrade

Released in 2007 as a modern replacement for screen. Same core concept, more features.

Basic Workflow

tmux workflow
# 1. Create a named session
tmux new -s my-scrape

# 2. Run your scraper
python scraper.py --resume

# 3. DETACH from the session
#    Press: Ctrl+B, then D

# 4. Close laptop, go about your life

# 5. Reattach later
tmux attach -t my-scrape

# 6. Script is still running

Essential Commands

ActionCommand
Create a named sessiontmux new -s session-name
Detach (leave running)Ctrl+B, then D
Reattach to a sessiontmux attach -t session-name
List all sessionstmux ls
Kill a sessiontmux kill-session -t session-name
Scroll up (view output)Ctrl+B, then [ (q to exit)
Split pane horizontallyCtrl+B, then "
Split pane verticallyCtrl+B, then %
Switch between panesCtrl+B, then arrow keys

Installation

macOS

brew install tmux

Ubuntu/Debian

sudo apt install tmux

CentOS/RHEL

sudo yum install tmux

Head-to-Head Comparison

Featurescreentmux
First released19872007
Split window (side by side)Clunky, limitedBuilt-in and intuitive
Multiple panes in one viewNoYes
Scrollback / copy-pasteWorks but awkwardMuch smoother
Configuration / themingMinimalHighly customizable
Pre-installed on macOSYesNo (need Homebrew)
Pre-installed on most LinuxOften yesSometimes
Learning curveLowerSlightly higher
Status barNone by defaultShows session info
Session persistenceExcellentExcellent
Active developmentMinimalActive
Plugin ecosystemNoneYes (via tpm)

Which One Should You Use?

Use tmux.

Here's why it's better for web scraping specifically:

1. Split Panes

Watch your scraper in the top half and monitor system resources (or tail a log file) in the bottom half — all in one window.

2. Better Scrollback

When you need to scroll up through thousands of lines of scraper output to find an error, tmux's scroll mode is significantly less painful than screen's.

3. Multiple Windows

Run different scraping jobs in different "windows" (tabs) within the same session. Switch between them with Ctrl+B, then 0/1/2.

4. Session Management

When you're running 3-4 different scraping jobs simultaneously, tmux's session management is cleaner and more intuitive.

tmux split pane example
# In a tmux session:
# Top pane: your scraper running
python scraper.py --resume

# Press Ctrl+B, then " to split horizontally
# Bottom pane: watch your output file grow
tail -f output/results.csv | wc -l
tmux session management
tmux ls
# email-waterfall: 1 windows (created Thu Feb 6 14:32:01 2026)
# registry-scrape: 1 windows (created Thu Feb 6 14:33:15 2026)
# healthengine: 1 windows (created Thu Feb 6 15:01:44 2026)

The Standard Web Scraping Workflow

Every time you start a long-running scrape, follow this pattern:

standard workflow
# 1. Create a session named after your job
tmux new -s job-name

# 2. Run your scraper
python my_scraper.py --resume

# 3. Detach
#    Ctrl+B, then D
#    Output: [detached (from session job-name)]

# 4. Check on it whenever you want
tmux attach -t job-name
#    (Ctrl+B, D to detach again)

# 5. When the scrape finishes, close the session
exit

Rule of thumb: Any scrape that takes more than 5 minutes should be in a tmux session. It costs you 5 seconds of setup and saves you from potentially losing hours of work.

Running Multiple Scrapes Simultaneously

One of the most powerful patterns — running multiple jobs in parallel, each in their own protected session:

parallel scraping sessions
# Session 1: Email enrichment (2-3 hours)
tmux new -s email-enrichment
python scripts/email_waterfall.py --phase finders
# Ctrl+B, D to detach

# Session 2: Registration scraping (19 hours)
tmux new -s registration-scrape
python scripts/registry_scraper.py --resume
# Ctrl+B, D to detach

# Session 3: Data validation (30 min)
tmux new -s validation
python scripts/validate_emails.py
# Ctrl+B, D to detach

# Check on all of them:
tmux ls

# Jump into any one:
tmux attach -t email-enrichment

Each session is fully independent. If one scraper crashes, the others keep going. If your laptop sleeps, they all keep going.

Pro Tips for Web Scrapers

1. Always Name Your Sessions

# Bad - you'll forget what's running
tmux new

# Good - instantly know what each session is doing
tmux new -s directory-scrape-batch-3

2. Combine with Logging

Don't just rely on terminal output. Log to a file too, so you have a record even if something goes wrong with the session:

python scraper.py --resume 2>&1 | tee output/scrape.log

tee writes output to both the screen AND a file simultaneously.

3. Use script for Full Session Recording

tmux new -s my-scrape
script output/session_recording.txt
python scraper.py --resume
# When done: type 'exit' to stop recording

4. Monitor from Outside

Check if your scraper is still running without attaching:

# See if the process is alive
ps aux | grep scraper.py

# Check the last few lines of output
tail -20 output/scrape.log

# Watch output in real-time without attaching
tail -f output/scrape.log

5. Graceful Shutdown

If your scraper supports it (e.g., saves progress on Ctrl+C), you can send signals from outside:

# Send Ctrl+C to a tmux session
tmux send-keys -t my-scrape C-c

Common Gotchas

"I can't reattach — it says already attached"

Someone (or another terminal) is already viewing the session. Force-attach:

tmux attach -t my-scrape -d    # -d detaches the other viewer first

"My session disappeared after a reboot"

screen and tmux sessions don't survive system restarts. They run in memory. This is why your scrapers should always support --resume from saved progress files.

"I'm on a remote server via SSH and my connection drops"

This is exactly what tmux was made for. SSH into the server, start tmux, run your scraper, detach. Even if your SSH connection drops, the tmux session keeps running on the server. Just SSH back in and reattach.

ssh myserver
tmux attach -t my-scrape    # Pick up right where you left off

"How do I copy text from tmux scroll mode?"

  1. Enter scroll mode: Ctrl+B, then [
  2. Navigate to where you want to start copying
  3. Press Space to start selection
  4. Move to end of selection
  5. Press Enter to copy
  6. Paste with Ctrl+B, then ]

Quick Reference Card

tmux (Recommended)

tmux new -s NAME          Create session
tmux attach -t NAME       Reattach
tmux ls                   List sessions
tmux kill-session -t NAME Kill session

Inside tmux:
  Ctrl+B, D               Detach
  Ctrl+B, [               Scroll mode (q to exit)
  Ctrl+B, "               Split horizontal
  Ctrl+B, %               Split vertical
  Ctrl+B, arrow keys      Switch panes
  Ctrl+B, c               New window
  Ctrl+B, n               Next window
  Ctrl+B, p               Previous window

screen (Fallback)

screen -S NAME            Create session
screen -r NAME            Reattach
screen -ls                List sessions
screen -X -S NAME quit    Kill session

Inside screen:
  Ctrl+A, D               Detach
  Ctrl+A, [               Scroll mode (q to exit)

How tmux Makes You a Better Web Scraper

tmux isn't just insurance against crashes. Once you build it into your workflow, it fundamentally changes how you approach scraping projects.

You Stop Babysitting Scripts

Without tmux, you unconsciously limit yourself. You avoid starting a 6-hour scrape because you know you can't keep your laptop open that long. With tmux, you launch and walk away. Your scraping capacity is no longer limited by how long you can sit at your desk.

You Can Run Scraping Pipelines in Parallel

Real-world scraping is rarely one script. It's a pipeline — scrape data, enrich it, validate it, export it. Without tmux, you run these sequentially. With tmux, you run all stages simultaneously. What used to take 8 hours sequentially now takes 3-4.

You Build Confidence to Scale

When you know your scrape is protected, you start thinking bigger: "Let me scrape all 50 states, not just 5." "Let me run the full 12,000 records overnight." tmux removes the psychological barrier of "what if something interrupts it."

You Get Better at Debugging

tmux's scroll mode and logging integration mean you can review exactly what happened during a 12-hour scrape. Compare this to running without tmux — if your terminal closes, your entire output history is gone.

You Can Scrape from Anywhere

Start a scrape on your desktop at the office. Go home. SSH into your machine. Reattach. Check progress from your phone over SSH. tmux makes your scraping location-independent.

Resume-Friendly Architecture Becomes Second Nature

Once you start using tmux, you naturally build better scrapers with progress checkpoints, --resume flags, atomic writes, and file-based logging. These habits make your scrapers more robust regardless of whether you use tmux.

parallel pipeline example
# Session 1: Scraping raw data
tmux new -s scrape
python scrape_listings.py --resume
# Ctrl+B, D

# Session 2: Enriching scraped data (runs on already-scraped records)
tmux new -s enrich
python enrich_contacts.py --watch-input output/raw_listings.csv
# Ctrl+B, D

# Session 3: Validating enriched emails
tmux new -s validate
python validate_emails.py --watch-input output/enriched.csv
# Ctrl+B, D

# Three stages running concurrently!
scrape from anywhere
# At the office - start scrape
tmux new -s big-scrape
python scraper.py --full-run
# Ctrl+B, D

# At home - check on it
ssh office-machine
tmux attach -t big-scrape
# Everything is exactly where you left it

tmux for Lead Generation & Business Scraping

Web scraping isn't just a developer hobby. For agencies, sales teams, and growth operators, scraping is how you build lead lists, enrich contact data, and fill your pipeline. tmux turns scraping from a fragile side task into a reliable, scalable business operation.

The Lead Generation Scraping Pipeline

Pipeline Flow
4 stages
STAGE 1tmux -s discovery

Discovery

Scrape directories, registries, and listing sites to find prospects

12,000 raw records
STAGE 2tmux -s enrich

Enrichment

Find emails, phone numbers, and LinkedIn profiles for each contact

8,200 enriched
3 API providers
STAGE 3tmux -s validate

Validation

Verify emails are deliverable before loading into outreach tools

3,847 verified
47% deliverable
STAGE 4tmux -s upload

Upload

Push validated leads into your CRM or email platform

Campaign ready
Instantly / Smartlead / CRM
Each stage runs in its own tmux session
12,000 in3,847 verified leads out

Real-World Example: Building a 10,000+ Contact Database

Copy this prompt into Claude Code, Cursor, or any AI coding agent. Replace the placeholders with your target industry and data source. The agent will build the scrapers, run each stage in tmux, and deliver a validated lead list.

COPY & PASTEFull prompt — drop into any AI agent
lead-gen-pipeline-prompt.md — paste into Claude Code / Cursor / any AI agent
Build me a lead generation pipeline that scrapes, enriches, and
validates 10,000+ contacts. I need campaign-ready leads exported
to CSV by the end of the run.

## Target
- Industry: [YOUR INDUSTRY, e.g. "SaaS companies", "dental practices",
  "ecommerce brands", "real estate agencies"]
- Region: [YOUR REGION, e.g. "United States", "Australia", "UK"]
- Source: [YOUR SOURCE, e.g. "Google Maps", "industry directory",
  "professional registry", "Yelp", "LinkedIn Sales Nav export"]

## Pipeline — Run Each Stage in tmux

Every script you write MUST support --resume from a checkpoint file.
Every stage runs in its own tmux session so nothing is lost if my
laptop sleeps or the terminal disconnects.

### Stage 1: Discovery
tmux new -d -s discovery 'python scrape_directory.py --resume'
- Scrape the source for raw business listings
- Extract: business name, address, phone, website URL
- Save to: data/raw_listings.csv
- Rate limit: 2-second delay between requests

### Stage 2: Domain & Website Enrichment
tmux new -d -s domains 'python find_domains.py --resume'
- For each business, find their website domain
- Crawl the website for staff pages, about pages, team pages
- Extract names, titles, and any visible email addresses
- Save to: data/with_domains.csv

### Stage 3: Email Enrichment (Waterfall)
tmux new -d -s enrich 'python email_waterfall.py --resume'
- For each contact, find their email using this waterfall:
  1. Check if the domain is catch-all (skip validation if yes)
  2. Try API providers (Hunter, Apollo, etc.) if available
  3. Generate email permutations (first@, first.last@, etc.)
  4. SMTP-validate the permutations against the mail server
- Save to: data/enriched.csv

### Stage 4: Email Validation
tmux new -d -s validate 'python validate_emails.py --resume'
- Verify every email is actually deliverable
- Remove risky, bouncy, and disposable addresses
- Save to: data/validated.csv

### Stage 5: Export
- Format final output as campaign-ready CSV
- Required columns: first_name, last_name, email, company,
  title, website, phone, city, state
- Deduplicate by email address
- Save to: output/campaign_ready.csv

## Rules
- Checkpoint progress every 25 records (atomic writes)
- Log to file AND stdout: 2>&1 | tee output/stage_name.log
- Handle Ctrl+C gracefully (save progress before exit)
- If a stage fails, log failed URLs to output/failed_urls.txt
- After launching each stage, run tmux ls and report status
- Check on running stages: tmux capture-pane -t [name] -p | tail -5

## When Complete
- Report total records at each stage (scraped → enriched → validated)
- Report deliverability rate (validated / enriched)
- Confirm output/campaign_ready.csv exists with final count

Why Lead Gen Scraping Specifically Needs tmux

Volume is the game

You're not scraping 10 pages — you're scraping 10,000. tmux lets you run at full scale without fear.

Multi-source aggregation

Good lead lists come from combining multiple sources — directories, registries, LinkedIn, company websites. Each is its own scraping job.

Enrichment waterfalls are sequential AND long

Email enrichment tries multiple methods in order: API lookup, catch-all detection, pattern matching, permutation validation. The full waterfall can take hours.

Rate limits force slow scrapes

Most sites have rate limiting. Government registries and directories often block above 15 requests/minute. These rate limits mean your scrapes inherently take a long time.

Validation is expensive and slow

Email validation APIs charge per check and rate-limit you. Validating 10,000 emails at 20/second still takes 8+ minutes.

Client deadlines don't wait

When a client needs 12,000 enriched leads by Monday morning, you need confidence that your Friday night scrape will complete.

The “Friday Night Deploy” Pattern

friday night deploy
# Friday 6 PM — Launch everything
tmux new -s lead-gen-pipeline
python full_pipeline.py --source all --enrich --validate --export
# Ctrl+B, D — Go home

# Saturday morning (from your phone, via SSH)
ssh work-machine
tmux attach -t lead-gen-pipeline
# Check progress: "Processing record 8,432 / 12,257..."
# Ctrl+B, D — Go back to your weekend

# Monday 9 AM
tmux attach -t lead-gen-pipeline
# "Pipeline complete. 12,257 records processed. 3,847 validated emails found."
# Export sitting in output/leads_final.csv, ready for Instantly upload

Give Your Agent The Full Playbook

Combine the web-scraping skill with a system prompt built on the RTO frameworkRole, Task, Output. Copy it into Claude Code, Cursor, or any AI coding agent.

The skill gives the agent knowledge (tmux patterns, scraper architecture, Crawl4AI integration). The RTO prompt gives it identity (Role), instructions (Task), and success criteria (Output).

SKILL

Web Scraping Skill

Gives the agent knowledge: tmux patterns, resume architecture, Crawl4AI, parallel pipelines

~/.claude/skills/web-scraping/SKILL.md
PROMPT

RTO System Prompt

Role (who the agent is), Task (the pipeline stages), Output (deliverables & quality criteria)

Paste into Claude Code / Cursor / any agent
COPY THISRTO system prompt — Role, Task, Output
system-prompt.md — paste into your agent
# ROLE

You are a senior lead generation engineer who specializes in building
automated data pipelines. You are methodical, infrastructure-aware,
and obsessed with data quality.

Your core competencies:
- Web scraping with resume-safe architecture (checkpoints, atomic writes)
- tmux session management for long-running, unattended processes
- Email enrichment using waterfall methodology
- Data validation and deduplication at scale

You never run long-running scripts outside of tmux. You always build
scrapers with --resume support. You log everything to files, not just
stdout. You treat rate limits as non-negotiable.

# TASK

Build a qualified prospect list of 10,000+ contacts by running a
4-stage scraping and enrichment pipeline. Each stage runs in its own
tmux session so work can continue unattended.

## Infrastructure Rules
- All scripts run inside tmux: tmux new -d -s [name] 'command'
- Check progress without attaching: tmux capture-pane -t [name] -p | tail -10
- Every scraper supports --resume from checkpoint files
- Rate limit: 2s default delay, increase to 5s on 429 responses
- Never exceed 20 requests/second to any single domain

## Stage 1: Discovery
- Scrape target directories and registries for raw business listings
- Save to data/raw_listings.csv with checkpoint at output/progress.json
- Session: tmux new -d -s discovery 'python scrape.py --resume'

## Stage 2: Enrichment
- Find email addresses using waterfall method:
  1. Check for catch-all domains first (skip these for SMTP validation)
  2. Try API providers (Apollo, Hunter, etc.)
  3. Fall back to pattern permutation + SMTP validation
- Save to data/enriched.csv
- Session: tmux new -d -s enrich 'python enrich.py --resume'

## Stage 3: Validation
- Verify all emails are deliverable (not just syntactically valid)
- Remove bouncy, risky, and catch-all addresses
- Save to data/validated.csv
- Session: tmux new -d -s validate 'python validate.py --resume'

## Stage 4: Export
- Format for the target platform (Instantly, Smartlead, CRM import)
- Include all enriched fields: name, company, title, email, domain
- Save to output/campaign_ready.csv

# OUTPUT

## Deliverables
- output/campaign_ready.csv — final deduplicated, validated prospect list
- output/failed_urls.txt — every failed URL logged for retry
- output/pipeline_report.md — summary of the full run

## Quality Criteria
- Minimum data per record: name, company, verified email
- Records without a deliverable email are excluded
- Deduplicated by email address across all sources
- Zero duplicate rows in the final export

## Pipeline Report (generate when complete)
When all stages finish, produce a summary including:
- Total records discovered vs enriched vs validated vs exported
- Deliverability rate (validated / enriched)
- Which tmux sessions ran and their durations
- Any sources that returned high error rates
- Final file locations and row counts

The RTO framework in action — each section gives the agent something different:

R

Role = Identity

The agent knows it's a lead gen engineer — it defaults to tmux, checkpoints, and rate limits

T

Task = Pipeline

Four clear stages: discovery, enrichment, validation, export — each in its own tmux session

O

Output = Deliverables

Campaign-ready CSV, failure log, and pipeline report with quality criteria baked in

Using Crawl4AI with tmux

Crawl4AI is an open-source LLM-friendly web crawler built for AI-powered data extraction. It uses a headless browser under the hood, handles JavaScript-rendered pages, and outputs structured data ready for AI processing.

The problem: Crawl4AI jobs on large sites can run for hours. Crawling 500+ pages with extraction, parsing, and rate limiting takes time. This is exactly what tmux was made for.

Crawl4AI + tmux
tmux new -s crawl-prospects

python -c "
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode

async def main():
    config = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS,
        page_timeout=30000,
    )
    async with AsyncWebCrawler() as crawler:
        with open('data/practice_urls.txt') as f:
            urls = [line.strip() for line in f if line.strip()]

        for i, url in enumerate(urls):
            result = await crawler.arun(url=url, config=config)
            if result.success:
                with open(f'output/pages/{i}.md', 'w') as out:
                    out.write(result.markdown)
            print(f'[{i+1}/{len(urls)}] {url} - {"OK" if result.success else "FAIL"}')
            await asyncio.sleep(2)  # Rate limit

asyncio.run(main())
"

# Ctrl+B, D to detach — crawl keeps running
Crawl4AI for lead enrichment
tmux new -s crawl-enrich

python crawl_practice_sites.py \
    --input data/practices_with_domains.csv \
    --output data/staff_extracted.csv \
    --extraction-strategy llm \
    --model gpt-4o-mini \
    --resume

# Ctrl+B, D

Let AI Agents Do the Work (On Your Server)

If you use AI agents like Claude Code, Cursor, OpenClaw, or ChatGPT to help with web scraping, tmux becomes absolutely essential when running on remote servers.

🤖 How AI Agents Work with Remote Servers

When you tell an AI agent “scrape 10,000 companies from this directory,” the agent writes the code AND runs it for you on a remote server. The problem? That scrape might take 8 hours. Without tmux, if your SSH connection drops or your terminal crashes, the entire scrape stops.

✨ With tmux: The agent starts the scraper in a tmux session on the server. The server keeps running 24/7. You can disconnect SSH, close your laptop, go to bed—the server keeps scraping. Check results in the morning.

🦾 AI Agent (Like OpenClaw) Scraping The Internet 🌐

👨‍💻

YOU say:

“Hey OpenClaw, scrape 10,000 company websites and extract their contact info. Run it in tmux on the server.”

🤖

AI AGENT does:

✓ SSH into your remote server

✓ Write the scraping code

✓ Start tmux session: tmux new -s scraper

✓ Launch scraper: python scrape.py

🖥️
ONLINE 24/7

🌍 REMOTE SERVER scraping the internet:

tmux session: scraper✓ PROTECTED

🕷️ Scraping: company-1234.com

🕷️ Scraping: example-corp.io

🕷️ Scraping: business-xyz.net

Progress: 6,847 / 10,000 ▓▓▓▓▓▓▓░░░ 68%

🎉

NEXT MORNING - You SSH back:

$ ssh myserver.com

$ tmux attach -s scraper

✓ COMPLETE: 10,000 / 10,000 companies scraped!

📊 Results saved to: contacts.csv (2.3MB)

That's the Power of AI Agents + tmux + Remote Servers! 🚀

The AI agent sets everything up. The remote server does the heavy lifting 24/7. tmux keeps it protected. You just give instructions and collect results. No babysitting required.

How to Tell Your AI Agent to Use tmux (It's Easy)

Just add one sentence to your request. That's it. The AI agent will handle the rest.

❌ Without tmux (risky)

“SSH to my server and scrape all the companies from this website.”

If SSH disconnects, the scrape dies.

✅ With tmux (safe)

“SSH to my server and scrape all companies. Run it in tmux so it keeps going if I disconnect.

Agent runs it in tmux on the server. Safe to disconnect.

What Happens Behind the Scenes (You Don't Need to Know This, But It's Cool)

When you tell an AI agent to use tmux on a remote server, here's what it does automatically:

What the AI agent does on your remote server
# Step 1: Agent SSHs to your server and starts tmux
$ ssh your-server.com
$ tmux new -d -s lead-scraper 'python scrape_leads.py --all'
#   Translation: "Run this on the SERVER in the background"

# Step 2: The SERVER runs the scraper (could be 8 hours)
#   Meanwhile, you can:
#   • Disconnect SSH
#   • Close your laptop
#   • Turn off your local computer
#   • Go to bed
#   • The SERVER keeps running (it has 24/7 power and internet)

# Step 3: When you come back, ask the agent:
"How's the scraper doing on the server?"

# Agent SSHs back in and checks for you:
$ ssh your-server.com
$ tmux ls                                      # Is it still running?
$ tmux capture-pane -t lead-scraper -p | tail # What's the latest output?

Commands Your AI Agent Uses (No Memorization Required)

You don't need to memorize these—your AI agent knows them. But here's what happens when you ask it to “check on the scraper” or “stop the scraper”:

CommandWhat the Agent Does
tmux new -d -s name 'cmd'Launch a script in a detached session
tmux lsCheck which sessions are still running
tmux capture-pane -t name -pRead the current output without attaching
tmux capture-pane -t name -p | tail -5Quick status check (last 5 lines)
tmux send-keys -t name 'q' EnterSend input to a running process
tmux send-keys -t name C-cGracefully stop a process (Ctrl+C)
tmux kill-session -t nameForce-kill a session when done
tmux has-session -t name 2>/dev/null && echo "running"Check if a job is still alive

💡 The magic: You just talk to the agent in plain English. It runs these commands for you. No memorization needed.

Why Not Just Use Background Processes?

You might think: “Can't the agent just run python scraper.py & to background it?”

Featurenohup cmd &tmux
Survives terminal closeYesYes
View live output laterNo (only log file)Yes (reattach)
Interact with processNoYes
Send Ctrl+C gracefullyNo (must kill)Yes (send-keys)
Agent can check statusNoYes
Multiple processes organizedMessyClean named sessions
Scroll through outputNoYes

Copy-Paste Prompt Template (For Remote Servers)

Just add this sentence to ANY request that involves running on a remote server:

✨ Magic Sentence for Remote Servers:

“Run this in tmux on the server so it keeps working if I disconnect SSH.”

That's literally all you need to say. The AI agent will handle the technical details.

Complete example request for remote server
"SSH to my server at myserver.com and scrape 5,000 companies from this
directory. Save their contact info to a CSV. Run this in tmux on the
server so it keeps working if I disconnect SSH."

Adding tmux to Your Agent's System Prompt

If you're building custom AI agents or using a framework that supports system prompts, consider adding tmux awareness:

system prompt addition
When executing scripts that may run longer than 5 minutes:
- Always use tmux to protect long-running processes
- Name sessions descriptively: tmux new -s [task-name]
- Detach after starting: Ctrl+B, D (or tmux detach)
- Check on processes: tmux attach -t [session-name]
- For overnight tasks, confirm the script supports --resume
- Never run long processes in the foreground without tmux

TL;DR

  1. 1
    Install tmux:brew install tmux
  2. 2
    Before any long scrape:tmux new -s descriptive-name
  3. 3
    Run your scraper
  4. 4
    Detach:Ctrl+B, then D
  5. 5
    Reattach anytime:tmux attach -t descriptive-name
  6. 6
    Never lose a scrape to a closed laptop again
  7. 7
    Always instruct AI agents to use tmux for long-running tasks
  8. 8
    Add tmux instructions to your agent prompts — it's the single biggest reliability improvement you can make
We do this every day

Need a scraping pipeline built for you?

We build and manage lead generation pipelines for B2B companies — the same tmux + AI agent architecture covered in this guide.

If you'd rather hand off the scraping, enrichment, and validation to a team that does this daily, we're happy to chat.

orView Our Services

No commitment — 15 minutes to see if we're a fit