USER GUIDE · v1 · LAST UPDATED 2026-05-15
browser-recon · agentic scraping reconnaissance
~/user-guide

One long page. Nine sections.

Everything a paying user needs to read at least once. If something here is wrong, reply to your welcome email — it goes to a real inbox.

01Install

One pipx command. The CLI is 128 KB. No analysis code, no proprietary IP runs on your machine — the agent lives server-side; the CLI is a thin orchestrator that drives Chrome and ships captures upstream.

Supported Python: 3.11+ (the CLI vendors its own runtime via pipx, but a system Python in that range is recommended for compatibility with virtualenvs you might pipe scans into).

pipx install browser-recon

Verify the install:

recon --version

What gets installed locally: the recon binary, a Chrome launcher, and a tiny capture proxy. What stays server-side: every LLM call, every validation request, every scrub rule, every blob.

02First scan

Three commands. The first picks a target; the second is you browsing like a human; the third is when you call it done.

recon scan https://www.target.com

The CLI:

  1. asks you to pick an intent template (e.g. product_listing, search_results, reviews) — this tells the agent what kind of data you're after,
  2. opens Chrome to the target URL with a capture proxy attached,
  3. watches every request you trigger as you browse for ~2 minutes.

When you've clicked through the data you want, hit Ctrl+C in the terminal. The CLI flushes the captured traffic, scrubs sensitive values, uploads, and prints the report URL.

what happens next

The server runs detection (anti-bot fingerprinting), endpoint analysis (which calls actually carry the data), validation (real test requests through real proxies), and synthesis (the recommendation + starter code). Roughly 5–10 minutes end to end. You'll get the URL immediately; the report fills in as each phase finishes.

03What's in your report

Every report has the same shape. Verdict first, evidence in the middle, runnable starter code at the bottom.

SlotWhat it isHow you use it
verdictSingle paragraph naming the anti-bot stack + recommended library × proxy combination.The headline. If this is wrong, the rest of the report won't save you.
library recommendatione.g. curl_cffi (chrome120) over residential.Pin to the named library + impersonation profile. Anything else is a guess.
headersThe minimum required header set the validation layer empirically confirmed.Copy verbatim into your client.
cookiesNames of cookies required for the captured request to authenticate.Run a warmup against the target homepage; the listed names should populate.
cost projectionDollar projection per 1,000 requests, derived from measured bandwidth × your proxy rate.Budget input. Compare across proxy tiers in the report's cost-band card.
starter codeA runnable Python script using the recommended library + headers + cookies.Drop into your stack. If the report's replay status is matched, this script reproduces the captured response shape.
replay resultsPer-endpoint status: matched / mismatch / untested.Spot-check the mismatch rows manually before you scale up.

04What we don't save

Section §4 is the load-bearing section of this guide for trust. It's specific on purpose. Vague claims ("we strip sensitive data") read as marketing; named fields read as engineering.

FieldTreatmentWhy
Cookie values Stripped before long-term storage; cookie names kept. Names are non-secret and useful for scraping recon. Values authenticate.
Authorization header Removed entirely. Always a bearer token, always sensitive.
X-Api-Key Removed entirely. API keys are credentials.
X-Auth-Token Removed entirely. Convention varies, but the name is reserved for auth.
X-CSRF-Token Removed entirely. Pairs with a session — both removed.
JWT-shaped strings in bodies Replaced with <redacted-jwt>. Regex matches eyJ… three-segment base64.
Bearer-token shaped strings Replaced with <redacted-token>. Heuristic match on long opaque alphanumeric blobs in known auth-y contexts.
Session cookies (specific names) Removed entirely (names + values). Named matches against the common list: sessionid, JSESSIONID, PHPSESSID, connect.sid, etc.

Scrubbing happens before the capture leaves your machine for long-term storage. The validation layer needs plaintext access transiently — see §5 — but the persisted blob is the scrubbed version.

05Security model

The honest framing. Read this once; we will not claim things we cannot back up.

# /user-guide §5

Encryption in transit:  All traffic between your machine and our servers
                        uses TLS 1.3.

Encryption at rest:    Captures are stored in AWS S3, encrypted with
                        AWS KMS (AES-256).

What's scrubbed:       Cookie values, Authorization headers, X-Api-Key,
                        X-Auth-Token, X-CSRF-Token, JWT-shaped strings,
                        and named session cookies are stripped before
                        long-term storage. (Full list in §4.)

What we do see:        We need plaintext access during validation
                        — that's how the agent confirms a library/proxy
                        combination actually works against the target.
                        We do NOT claim end-to-end encryption.

Who can read it:       browser-recon staff, when a support case
                        requires it, with audit logging.

Deletion:              Captures auto-delete at report expiry. The
                        report itself is also deleted at expiry.
why we don't say "end-to-end"

End-to-end encryption is a regulated marketing claim with a specific technical meaning: only the endpoints (you and your collaborators) can decrypt the data. Our validation layer needs to fire real requests with the captured cookies / headers, so the server has plaintext access during that window. Saying "end-to-end encrypted" would be materially false. We say TLS 1.3 + KMS instead, because that's what we do.

06Pricing & credits

Four tiers. One trial. The full pricing page is at /pricing; the short version:

  • Tester — $5/mo, 5 credits, reports live 2 weeks. One-time only per account, enforced server-side.
  • Beginner — $10/mo, 12 credits, reports live 1 month.
  • Pro — $20/mo, 30 credits, reports live 2 months.
  • Pro Max — $60/mo, 100 credits, reports live 3 months.

Credits expire monthly. No rollover. The next month's grant overwrites the balance; it does not accumulate. A re-scan from an expired report costs the same 1 credit as a fresh scan.

2 free scans on activation. When we flip you from waitlist to active, you land with 2 free scans. They're independent of your tier purchase.

V1 is admin-granted; Stripe Checkout lands in V2 once first-cohort pricing has settled. To pay, just reply to your welcome email.

07Dashboard

The dashboard lives at /dashboard. What you see there:

  • Credit balance — current month + the date your tier renews.
  • Recent scans — every scan you've run, newest first. Each row links to its report.
  • Per-report expiry — the stamped expires_at date for each report; downgrading later doesn't shorten it.
  • Re-scan — one-click re-scan on expired or stale reports. Costs one credit.

You can also drop into /dashboard/data to see and delete your own scans + intent text — single source of truth for "what does browser-recon have about me."

08FAQ

How long do reports stay live?
Depends on your tier: 2 weeks for Tester, 1 month for Beginner, 2 months for Pro, 3 months for Pro Max. The window is stamped at scan-complete time and frozen on the row — downgrading later doesn't retroactively expire old reports.
What happens when credits expire?
The unused balance from the current month is zeroed at the start of the next billing month. The new month's grant overwrites the balance; it does not accumulate. You can't "save up" 30 credits for one big month — they reset on a fixed cadence.
Can I bring my own proxies?
Not in V1. The validation layer runs through our managed proxy tiers (datacenter / residential / mobile) because the cost projection on the report depends on a known rate card. BYO proxies land once we have a way to inject the rate card.
What if the site I want is blocked?
The agent will tell you. Detection returns the anti-bot vendor + severity tier; if validation can't get through any library × proxy combination, the report ships with a "currently un-scrapeable from our infrastructure" verdict and a refund of the credit. We'd rather be honest than waste your money.
Can I write a custom intent template?
Not in V1. The shipped templates (product_listing, search_results, reviews, login_walk, custom) cover the common cases. If you have a workflow that doesn't fit, email us and we'll add it manually — and if enough users need the same one, it lands as a built-in.
How accurate is the recommendation?
Across our verification set (Walmart, Staples, Target, Airbnb, Ticketmaster, CoinMarketCap), the recommended library × proxy combination reproduces the captured response shape on the first replay attempt about 88% of the time. The other 12% surface as mismatch rows in the replay-results table — they're still scrapeable, you just have to tune by hand.
Do you sell my scan data?
No. We use aggregate, anonymized statistics (e.g. "n% of recent scans against site X return a 403 to requests") to improve the agent. Individual scan blobs and reports never leave your account; staff only access them when a support case requires it, with audit logging.
Can I export my scans?
JSON exports of report metadata land in V2. Full blob exports are gated on a paying customer asking — the leak risk of shipping a portable report file currently outweighs the convenience.
What happens when I cancel?
Credits stop renewing at the next billing date. Existing reports stay live for the retention you were on when each was scanned. The users.tier column goes back to NULL.
Why is there a tweet jump-line?
Word of mouth is our acquisition channel. A tweet is a costly signal that says "this tool is worth talking about" — costly enough that it materially helps us prioritize the queue. Optional, always.

09Troubleshooting

Common failure modes. If your symptom isn't here, reply to your welcome email.

Chrome won't launch on the first scan
The CLI uses pipx's bundled Chromium. On Linux with no system libnss3, the launch fails silently. Install it: sudo apt-get install libnss3 libnspr4 (or the equivalent for your distro). On macOS / Windows you should not hit this.
Scan stuck at "validating"
Validation fires real proxy requests; on a high-protection target, the test traffic can take 2–5 minutes per endpoint × library × proxy combination. Give it ten minutes. If it's still wedged, the dashboard's per-scan debug view (admin only in V1) will surface which endpoint hung — drop us a line with the scan ID and we'll dig in.
Report says replay failed
The recommendation is still the best library × proxy combination we found; replay failed means the agent couldn't reproduce the captured response shape after applying the recommended config. Usually a missing cookie warmup or a session-bound CSRF token. The report's replay-failure card lists the failing endpoint and the captured-vs-replayed diff — start there.
Credit not deducted after a failed scan
By design — if validation can't get through any library × proxy combination, we refund the credit automatically. Check your credit balance in the dashboard. If it didn't refund, that's a bug; reply with the scan ID and we'll fix it manually.
"Token expired" on the magic-link callback
Magic links expire after 15 minutes. Request a fresh one from /dashboard/login. If you click an old link as a waitlisted user, you'll land back on the /waitlist page — that's the expected behaviour, not a bug.