The Technology Behind vgr_zirp

← Back to vgr_zirp

vgr_zirp is a retrieval-augmented Q&A system that answers questions in Venkatesh Rao's intellectual voice, drawing on his published writing from 2007–2023. It is not a fine-tuned model — it retrieves relevant passages from the actual corpus and constructs answers grounded in that material. This page documents how it works.

The corpus

The system indexes three bodies of work:

Corpus	Source	Scale	Coverage
Ribbonfarm blog	ribbonfarm.com	1,133 posts · 7,588 vectors	2007–2023
Twitter/X archive	vgr full export (curated)	~57,700 tweets and threads · 57,715 vectors	2007–2022
Books + bibliography	EPUBs + HTML + PDFs + bibliography_raw.json	7 books, 1,699 Quora answers, 6 guest articles, 908 bibliography items · 5,840 vectors	2010–2022

The books corpus includes Tempo, Be Slightly Evil, Breaking Smart (season 1 essays + full 2015–2019 newsletter archive), Art of Gig (vols 1–3), Guerrilla Guide to Social Business (20 essays, 2011–2012), 1,699 Quora answers from 2010–2014, guest articles for Forbes/The Atlantic/Aeon, plus a 908-item bibliography of books, papers, and essays cited across the blog. Each book section and long-form Quora answer has a Haiku-generated 2–3 sentence summary; each bibliography item has a 3-sentence semantic summary — both used to improve title-query retrieval.

Embedding and retrieval

Chunking

Blog posts and book text are split into 512-token chunks with 64-token overlap. Each chunk is stored with metadata: source, date, author, title, series membership, and whether the post was collected into a book. Bibliography items are stored as single vectors (one per item, not chunked).

For both blog posts and book sections, the embedding text is prefixed with "Title: {title}\nSummary: {summary}\n\n" before the chunk body — ensuring that title and topic keywords are always present in the embedding vector even when they don't appear in the chunk body. The raw display text is unchanged; only the vector encoding receives the prefix.

In addition to body chunks, each blog post and each book section generates a dedicated summary vector (chunk_type: "post_summary" or "section_summary") that embeds only the title, summary, and tags. These give the retriever a clean, noise-free representation for title-based or high-level topic queries, which then surfaces the corresponding body chunks as sources.

Embedding model

Voyage AI voyage-3 — 1,024-dimensional dense vectors, cosine similarity. The same model is used for both document encoding (at index time) and query encoding (at query time), which is important for retrieval quality.

Vector indexes

Three Pinecone serverless indexes (AWS us-east-1):

Index	Vectors	Contents
`ribbonfarm`	7,588	Blog posts: body chunks + one post-summary vector per post
`vgr-twitter`	57,715	Full Twitter archive; tweets grouped into threads
`vgr-books`	5,840	Book sections: body chunks + one section-summary vector per section; Quora answers (long-form summarized); guest articles; bibliography items (one vector each)

Tier weighting

Retrieved chunks are scored by semantic similarity, then adjusted by a content-tier multiplier before merging across indexes. The tier order reflects editorial curation signal:

Tier	Content type	Weight
0	vgr-books content (non-bibliography)	1.15×
1	Blog post collected into a book	1.10×
2	Blog post in a named series	1.05×
3	Plain blog post	1.00×
4	Bibliography item	0.95×
5	Tweet collected into Twitter book	0.90×
6	Thread (not in book)	0.85×
7	Individual tweet	0.80×

Up to 8 sources are passed to the language model as context.

The persona: deriving a soul document

The most distinctive part of vgr_zirp's architecture is the persona layer — a detailed structured document that captures the author's worldview, characteristic intellectual moves, voice patterns, and rhetorical style, used as the language model's system prompt.

Method

The persona was derived using a process we call the soul document approach, inspired by techniques in the AI character-building community for extracting stable personality representations from a text corpus.

Attribution: The soul document methodology used here was adapted from soul.md by Aaron J. Mars. The core idea: rather than hand-authoring persona instructions, dump your writing into a folder, let a capable language model analyze it, and synthesize a structured set of documents — SOUL.md (worldview, themes, opinions) and STYLE.md (voice, rhetorical patterns, what to avoid) — that any LLM can load to write as you. The resulting documents are more coherent and internally consistent than hand-authored persona prompts, because they are derived from actual writing rather than self-description.

Application to this project

The derivation script (derive_soul.py v2) uses AI-generated summaries of all 708 Venkat-authored posts as its corpus (vs. 90 post excerpts in v1), and uses the 15 empirically-derived topic clusters from the blog's tag co-occurrence graph as structural anchors for SOUL.md theme organization. The script calls Claude Sonnet with two prompts (~99K tokens each):

plans/SOUL.md — 15 core intellectual themes (up from 13 in v1), full-corpus coverage, with signature vocabulary, known contradictions, characteristic intellectual moves, and strong positions (~26,000 words)
plans/STYLE.md — sentence-level patterns, rhetorical structures, neologism introduction pattern, era-by-era voice evolution, a "Signature Formats" section on 2×2 matrices and aphorisms, and an explicit "what to avoid" section

A third constant, LEXICON_MD, is generated from the top 50 high-confidence terms in data/glossary_candidates.json — an AI-derived glossary of Venkat's coinages and redefinitions — providing precise, scannable definitions the model can draw on without retrieving a post.

All three documents are compiled into workers/oracle/persona.js (~62KB). The system prompt is assembled in workers/oracle/build-prompt.js, which imports from persona.js and is the single source of truth used by both the oracle Worker and the MCP Worker. It includes:

ORACLE IDENTITY — factual answers to meta-questions the corpus can't answer: the etymology of "vgr_zirp" (Drew Austin tweet on ZIRP-era personality), full biography (born 1974 Jamshedpur; IIT Mumbai B.S. 1997; Michigan M.S./Ph.D. 1999/2004; Cornell postdoc 2004–06; Xerox Research Center Webster NY 2006–11; Sulekha.com 2000–01; Ribbonfarm founded 2007 while at Xerox), the Gervais Principle series, a technical self-description of the RAG pipeline, and a redirect rule for empty-archive queries ("ask the live vgr at venkateshrao.com").
CORPUS MAP — a structured inventory of what is and isn't in the index: full publication lineage (Ribbonfarm → Breaking Smart S1 → Breaking Smart Newsletter → Contraptions rebrand sequence), alias table so the model recognizes variant names ("Ribbonfarm Studio" = 2019–2021 Contraptions era; "BS Newsletter" = the same 144-issue archive), and an explicit not-indexed list (post-2019 Contraptions, Refactor Camp talks, external publications).
VOICE RULES — first person for Venkat's content, third person for guest contributors, bibliography items treated as recommended reading.
CONVERSATIONAL REGISTER — turn-by-turn pacing rules: 2–4 sentences on turn 1, hard cap 3 paragraphs, questions only when there's a genuine hook in the user's phrasing, pop culture / metaphor / memetic-phrase instruction, concrete-anchor requirement.
SOUL_MD + LEXICON_MD — full worldview and vocabulary.
STYLE GUIDE — Signature Formats and What to Avoid sections from STYLE_MD.
GUARDRAILS — four unconditional rules: temporal scope, professional distance, personal scope, persona integrity.

Inference

Each query follows this pipeline:

Embed query via Voyage voyage-3 (input_type="query")
Query all three Pinecone indexes in parallel (top-15 / top-12 / top-15 per index)
Normalize, tier-weight, and merge results; deduplicate by document ID
Select top 8 sources; build context block with labeled excerpts
Call Claude Sonnet (claude-sonnet-4-6) with cached persona system prompt, full prior-turn history (if any), and context block
Return answer + sources as JSON

A mode=sources query param stops after step 4, returning retrieval results without an LLM call. This is used for the semantic search interface.

Conversations are multi-turn: the Worker accepts a history array alongside the current query and passes the full prior-turn exchange to Claude as the messages array (up to 8 turns). The persona system prompt is always at position 0 and is prompt-cached; history is appended after it without displacing the cache anchor.

Cost

A typical (cached) query costs roughly $0.017 (1.7¢): ~12,161 cached system-prompt tokens × $0.30/M + ~2,200 non-cached context tokens × $3.00/M + ~450 output tokens × $15.00/M. The persona system prompt is cached via Anthropic's prompt caching API (5-minute TTL), cutting its cost from $0.036/query to $0.004/query on cache hits. A cold call (first in a 5-minute window) costs ~$0.059. The Voyage embedding call is negligible (~$0.000001/query).

Conversation features

Session actions

A persistent action bar below the chat input provides three operations:

Action	What it produces
Copy chat	Clipboard: sources (with live URLs) + Q&A pairs for all turns — paste into any context
Download .md	Markdown file: soul excerpt header + sources (`URL:`-prefixed) + Q&A — structured for LLM continuation via the MCP server
Clear chat	Resets all state, DOM, and URL — begins a fresh session

Transcript sharing

After the turn limit (8 turns), the interface offers optional transcript submission. Submissions are stored in a Cloudflare D1 database and optionally published to a public transcript gallery.

Field	Details
Storage	Cloudflare D1 (SQLite); one row per transcript: `id`, `session_id`, `share_mode` (public/private), `title`, `messages` (JSON), `created_at`
Share modes	Public — visible in the transcript gallery; Private — stored but not listed; both are content-addressed by UUID
Auto-filter	Submissions are screened before publication: personally identifying content, off-topic threads, or test queries are held for review

Session tracking

On the first turn of each conversation (history.length === 0), the Worker fires a lightweight background ping that increments daily and lifetime session counters in KV. This gives a "sessions started" funnel metric independent of whether the conversation reaches the turn limit or whether a transcript is submitted.

Infrastructure

Component	Technology
Workers	Cloudflare Workers (V8 isolates) — two deployed workers: `ribbonfarm-oracle` (web UI) and `ribbonfarm-mcp` (MCP server); both share the same system prompt and retrieval pipeline
Rate limiting	Cloudflare KV — web: 20 queries/IP/hour; MCP: 30 `ask_vgr_zirp`/IP/day
Circuit breaker	KV flag + hourly cron; sleeps when hourly spend exceeds $4, or all day when daily spend exceeds $30
Usage stats	KV accumulators (hourly/daily/lifetime, web + MCP separately); session-start counters; visible in stats box on oracle page
Transcript storage	Cloudflare D1 (SQLite) — submitted transcripts with share mode, title, and full message history
Alerts	Telegram bot — circuit trips, daily spend summary
Deployment	`wrangler deploy` from `workers/oracle/` or `workers/mcp/` in the ribbonfarm-site repo

Limitations

Temporal boundary: The corpus ends in 2023. vgr_zirp explicitly qualifies responses about post-2023 events and does not attempt to simulate post-corpus positions.
Retrieval errors: Questions about niche topics with few matching chunks will produce answers that generalize from tangentially related material. The sources panel shows exactly what was retrieved.
Author vs. archive: vgr_zirp speaks from the written record, not from Venkat's current views. Ideas that were explored and discarded, or positions since revised, remain in the corpus as-written.
Hallucination risk: Claude Sonnet can generate plausible-sounding but incorrect attributions. When specific claims matter, follow the source links.

MCP access (public)

vgr_zirp is available as a public Model Context Protocol server — no account or API key required. Connect it to Claude Code or Claude Desktop and use the corpus directly from your AI client.

Endpoint: https://ribbonfarm.com/mcp

Tool	What it does	Limit
`ask_vgr_zirp`	Full RAG + Claude Sonnet response in vgr's voice; supports multi-turn history and prior_session resumption	30 calls/IP/day
`search_corpus`	Semantic search across all corpora, returns ranked excerpts	Unlimited
`submit_mcp_session`	Submit a completed session to the ribbonfarm archive (public or private); returns a compressed summary for local transcript resumption	5/IP/hour

Claude Code

Run once in your terminal:

claude mcp add vgr-zirp --transport http https://ribbonfarm.com/mcp

Claude Desktop

Add to your claude_desktop_config.json:

{"mcpServers": {"vgr-zirp": {"type": "http", "url": "https://ribbonfarm.com/mcp"}}}

Session features

The MCP server has session awareness injected into answer text. On your first exchange you'll see a hello orienting you to the interface and its limits. Reminder nudges appear at exchanges 10 and 20; an alert fires at exchange 30 (the recommended coherence limit). All messages encourage saving transcripts locally to ./vgr_zirp_transcripts/ and optionally submitting to the ribbonfarm archive.

To resume a prior session: save the transcript with a ## Session Summary section at the end (the submit tool generates this automatically), then pass that section as prior_session on the first call of the new session. vgr_zirp will acknowledge the prior context in its hello. Usage is subject to the Terms of Use.

The ask_vgr_zirp daily limit resets at midnight PT. search_corpus has no limit — use it freely for research or agentic workflows.

Changelog

Version history for vgr_zirp. Semantic versioning: MAJOR = corpus / model / retrieval changes; MINOR = new features, persona; PATCH = bug fixes.

v2.2.0 MCP session construct + submit tool + prior_session resumption + TOS 2026-05-13

MCP server now injects session-awareness messages into answer text: hello on exchange 1, nudge reminders at exchanges 10 and 20, coherence alert at exchange 30. Text is editable in workers/mcp/messages.js.
New prior_session parameter on ask_vgr_zirp: pass the ## Session Summary block from a local transcript to resume context across sessions. vgr_zirp acknowledges the prior topic in its hello.
New submit_mcp_session tool: submits a session to the ribbonfarm D1 archive (public/private). Public submissions go through a Haiku content filter; private submissions are AES-256-GCM encrypted. Returns a Haiku-compressed session summary to append to the local transcript file under ## Session Summary for future resumption.
Tool description instructs agents to save transcripts to ./vgr_zirp_transcripts/ for ongoing project consulting.
MCP-submitted transcripts stamped v2.2.0-mcp in bot_version for analytics.
Terms of use page live at /vgr_zirp_terms/: Ribbonfarm Consulting LLC, WA governing law.

v2.1.1 Roundup soft-tagging + Aeon PDF fix 2026-05-12

Year-end roundup posts and link-list issues are now tagged is_roundup: True in Pinecone metadata (204 vectors updated; no re-embedding). The oracle labels them [ROUNDUP/COMPILATION] and is instructed to treat them as title lists only — confirming existence and timing of pieces, not paraphrasing their contents.
Search worker extended: roundup filtering now reads Pinecone metadata (in addition to title heuristics) and applies to books/newsletter results as well as blog posts. Explicit roundup queries still surface them.
Fixed Aeon browser-print PDF chrome stripping: form-feed characters now normalized before regex matching, handling single-line date+title headers and single-line URL+page footers correctly.

v2.1.0 Quora corpus + guest articles + corpus map v2 2026-05-12

Quora archive indexed: 1,699 answers (2010–2014); 701 long-form answers summarized via Haiku; 1,776 body + 708 summary vectors added to vgr-books. Oracle labels these [QUORA ANSWER] and responds in first person ("On Quora I said…").
Guest articles indexed: 6 pieces for Forbes (3), The Atlantic (2), and Aeon (2). Labeled [GUEST ARTICLE for …] with the publication named.
Guerrilla Guide to Social Business added (20 essays, 2011–2012). vgr-books total: 5,840 vectors.
Corpus map v2 in system prompt: full alias table (e.g. "Ribbonfarm Studio" = 2019–2021 Contraptions era), not-indexed list, publication lineage for Breaking Smart → Contraptions rename sequence.
Quora archive site section live at /quora/: 1,699 answers with year-tab navigation, date/length sort, 629 recovered outbound links.

v2.0.2 Book section summaries + corpus map in system prompt 2026-05-12

All book sections now have AI-generated summaries (Claude Haiku); used as title-prefix on embeddings — matching the blog approach that already existed.
Section-summary vectors added to vgr-books index (one per section) for direct title-query retrieval. Index: 3,158 vectors.
Corpus map added to system prompt: full publication lineage, alias table (e.g. "Ribbonfarm Studio" = 2019–2021 Contraptions era), not-indexed list. Fixes cases where the bot didn't recognize "Breaking Smart newsletter" as part of its corpus.

v2.0.1 BS newsletter boilerplate stripping 2026-05-12

Diagnosed via test chat: mailchimp footer (~150 words of subscription links, copyright, "Sidebar for New Readers") was overrepresented in chunks and dominated retrieval, surfacing administrative scaffolding instead of ideas.
Boilerplate stripped before chunking; 839 → 804 clean chunks. vgr-books: 2,832 vectors.

v2.0.0 Title-anchored embeddings + BS newsletter full archive + bot versioning 2026-05-12

Blog embeddings now prefix each chunk with its title and AI summary — fixes retrieval failures where the topic keyword appeared only in the post title, not the body.
Post-summary vectors added (one per post, embedding title + summary + tags). Blog index: 7,588 vectors.
Breaking Smart Newsletter 2015–2019 full archive added (144 posts). vgr-books: 2,867 vectors.
Semantic versioning system established; bot_version stored on all new transcripts.

v1.5.0 Persona refinements + direct quotation rule 2026-05-11

Questions on turn 1 only when there's a genuine hook in the user's phrasing — no forced clarification.
"Honest"/"honestly" tic banned; persona states views directly.
New rule: reproduce verbatim phrases from retrieved text rather than always paraphrasing — the archive speaking in its own words.
Twitter source cards show 160-char excerpt instead of generic "Tweet" label.

v1.4.0 Per-query activity logging 2026-05-10

Every query writes a row to D1 query_log: timestamp, source (web/MCP), turn number, token counts, actual cost. Enables precise analytics without depending on the 30-day KV TTL window.
Hourly stats endpoint: GET /api/oracle/stats/hourly.

v1.3.0 Live stats box + session tracking + prompt caching + MCP on Sonnet 2026-05-10

Health stats box on oracle page: days live, lifetime cost, WEB/MCP query grid, sessions started, transcripts shared.
Session-start ping tracks conversations started, distinct from query counts or transcript submissions.
Prompt caching: warm queries ~$0.017 vs ~$0.059 cold. MCP worker upgraded to Sonnet 4.6.

v1.2.0 Parchment UI + transcript sharing + public chat gallery 2026-05-10

Warm parchment palette (#f5e8cc) distinguishes all ZIRP pages from the main archive.
Transcript sharing: optional submission after wrapping up a chat (private or public, with rating and review).
Public chat gallery at /vgr_zirp_chats/ and individual chat viewer at /vgr_zirp_chat/.
Skull SVG in main site nav; ZIRP subnav (Oracle · Tech · Chats) injected at build time.

v1.1.0 Persistent action bar + copy/download/clear 2026-05-10

Copy chat, Download .md, and Clear chat buttons persist below the input for the full session.
Copy = clipboard (Q&A + sources with live URLs). Download = .md with soul excerpt header, suitable for pasting into any LLM for continuation via MCP.
No clarifying questions after turn 3; turns 4+ engage directly without asking.

v1.0.0 Sonnet upgrade + multi-turn history + conversational persona redesign 2026-05-09

Model upgraded from Haiku to Claude Sonnet 4.6.
Multi-turn: Worker maintains full conversation history across up to 8 turns.
Conversational persona redesign: turn-by-turn pacing, pop culture / metaphor / memetic-hook instruction, concrete-anchor requirement.
2×2 quadrant renderer: markdown tables detected and rendered as visual grid diagrams.

v0.3.0 Full-corpus SOUL/STYLE v2 + LEXICON + public MCP 2026-05-09

SOUL.md v2 derived from all 708 Venkat-authored post summaries (vs. 90 excerpts in v1). 15 themes, full-corpus coverage.
STYLE.md v2 adds Signature Formats: 2×2 matrix guidance, aphorism construction, thread-argument format.
LEXICON injection: top-50 Venkat coinages and redefinitions, scannable by the model.
MCP server public at ribbonfarm.com/mcp — no auth required.

v0.2.0 Multi-source corpus: Twitter + books added 2026-05-08–09

Twitter archive indexed: 57,715 vectors (curated tweets and threads, 2007–2022).
Books corpus indexed: Breaking Smart, Art of Gig, Tempo, Be Slightly Evil + 908-item bibliography.
Oracle queries all three indexes in parallel.

v0.1.0 Initial launch 2026-05-08

Oracle live: Ribbonfarm blog only (6,489 vectors), Claude Haiku, single-turn, bookmarkable ?q= URLs, rate limiting.
SOUL.md v1 and STYLE.md v1 derived from top-PageRank post excerpts.

Acknowledgments

Voyage AI — voyage-3 embedding model. voyageai.com

Pinecone — serverless vector indexes. pinecone.io

Anthropic — Claude Sonnet (persona derivation and inference). anthropic.com

Soul document methodology — adapted from soul.md by Aaron J. Mars.