Dev Log

A build log for this site — how it was made, what changed, and where things stand. Written for readers curious about the process, and as a reference for future maintenance.

Where We Started

Ribbonfarm ran as a WordPress blog from 2007 to 2024 — 17 years, 1,133 posts, 13,258 comments, 60 contributors (32 WordPress accounts, many bundled under a shared "Guest" handle).
The blog was retired in late 2024. The goal: preserve everything as a permanent static archive and redirect the domain here, with no server-side processing.
Source materials: a 527,000-line WordPress XML export (February 2026), a full media library (6,878 files, 1.1 GB), a complete MySQL database dump, and the original WordPress theme files for CSS reference.
All source materials are kept read-only. Nothing was regenerated from scratch; all work was done on editable copies.

The Approach

A custom Python static site generator (build_site.py) written from scratch. No Jekyll, Hugo, or other framework — direct control over every output file.
Content format: each post and page is a .html file with YAML frontmatter + original HTML body. Non-lossy: nothing was stripped that couldn't be reconstructed.
Images served from Cloudflare R2 (media.ribbonfarm.com), not bundled into the static build.
Hosted on Cloudflare Pages (direct upload, not git-connected CI).
Work done in iterative sessions with Claude (Sonnet): build → review → fix → deploy. Each session tackled one area at a time.
The build is fully reproducible from source. A "rebuild from scratch" procedure is documented in CLAUDE.md.

Session Log

February 2026 — Extraction and Foundation

Wrote extract.py to parse the WordPress XML export into per-post .html files with YAML frontmatter. Extracted 1,133 posts, 45 pages, 1,021 comment sidecar files.
Wrote the initial build_site.py: post pages at /YYYY/MM/DD/slug/, series index pages, author archive pages, full archive, homepage, CSS/JS, _redirects.
WordPress artifacts stripped at build time: Gutenberg block comments,  tags, Amazon rcm iframes, tracking pixels.
Image URLs rewritten from ribbonfarm.com/wp-content/uploads/ to R2.
Wrote upload_media_r2.py for idempotent rclone-based media sync to R2.
Built a link graph pipeline (mine_links.py, compute_graph.py): 2,773 internal link edges, PageRank scores, interactive D3.js visualization. (Later archived — the 921-node graph was too dense to be useful.)

April 2026, Sessions 1–2 — Content Cleanup and Hardening

Ran a full content audit (audit_content.py): 184 dead trailmeme links, 26 dead images, 8 broken PDF links, 276 broken internal page links flagged.
Applied corpus-wide formatting fixes: bold/underline pseudo-headings promoted to real <h2>/<h3> (149 fixes across 39 posts), [caption] shortcodes → <figure> (97 posts), [embed] shortcodes → iframes (8 posts), wpautop paragraph spacing, TinyMCE artifacts removed.
Wrote test_pipeline.py: 61 tests covering all content transforms.
Reviewed all 45 WordPress pages against a kept-pages list; expanded from 9 to 14 published pages.
Built external prominence pipeline: extracted pingbacks from the WordPress XML (302 posts, 1,215 pings), fetched Hacker News and Reddit mentions (41 posts each), computed composite influence scores. Used for "Most influential" archive sort and post-page coverage notes.
Moved Static Site into its own private git repo (github.com/vgururao/ribbonfarm-site).

April 2026, Sessions 3–4 — Tagging, Clustering, and Navigation

Tag vocabulary design: Read a sample of posts and iterated on a controlled vocabulary, balancing coverage against overlap. Final vocabulary: 120 tags. Tags are thematic (e.g., systems-thinking, narrative, economics) rather than keyword-based, designed to be useful for browsing rather than search.
Tagging runs: Two full tagging passes were run. The first (72-tag vocabulary) used the synchronous API at full price — ~$26. After expanding the vocabulary to 120 tags, a second full pass was submitted via the Batch API at 50% discount — ~$18. A third Batch API pass synthesized the glossary definitions — ~$5. Total API spend to date: ~$43. The Batch API submits all requests asynchronously and returns results within 24 hours — practical for a one-time corpus-scale task.
Tag rollup: After tagging, manually reviewed low-frequency tags. Over two cleanup passes, reduced the vocabulary from ~160 effective tags down to 100 by merging singletons and low-count tags (minimum 15 posts per tag) into their nearest covering tag. One rename: cloudworking → free-agency.
Cluster detection: Built a tag co-occurrence graph (edges weighted by number of posts sharing two tags). Ran Louvain community detection at resolution 2.0 to find 15 thematic clusters — groups of tags that tend to appear together. Two earlier attempts with label propagation produced degenerate results (one giant cluster); Louvain at the right resolution gave clean, interpretable groupings. Clusters were given descriptive names manually.
Lexicon synthesis: A second Batch API pass asked Claude to synthesize a definition for each term that appears across Ribbonfarm posts with some consistency — terms coined by the blog or given special meaning. Prompt was structured to produce dictionary-style entries grounded in actual usage across the archive. Output: 344 definitions. Total additional cost: ~$5.
Pages built: /clusters/ (index + 15 individual cluster pages, each listing member tags and all tagged posts), /glossary/ (alphabetical, configurable frequency filter, suppress list for overly generic terms), /tags/ (wordcloud2.js canvas word cloud with log-compressed sizing + alphabetical list).
Restructured navigation: added an Explore dropdown (Series, Clusters, Glossary, Tags, Bibliography).
Merged the old About and History pages into a single /history/ page with sections: Timeline, Design History slideshow (22 Wayback Machine screenshots), Maps, The Name, Taglines.
Centralized all page blurbs in data/page_blurbs.md — a single editable file read at build time, supporting Markdown formatting and template variables.

April 2026, Sessions 5–9 — Bibliography, Homepage, and Polish

Built build_bibliography.py: scanned all 1,133 posts for Amazon affiliate links, academic URLs, and frequently-cited external links. Output: 616 books, 95 papers, 79 essays.
Wrote enrich_bibliography.py: fetched canonical titles for papers via arXiv, CrossRef, NCBI, and for books via Open Library's ISBN lookup. Fixed 94 of 116 bad-title book entries; removed 13 non-book entries (Amazon help pages, games, expired links).
Expanded homepage Essential Posts from 28 to 73 (28 Venkat + 45 guest), sorted oldest-first. Added a Highlights Tour series page with the full 73-post set in chronological order.
Archive improvements: default sort set to "Oldest first"; within-year sort fixed; "Most discussed externally" sort option added; author attribution moved inline.
Glossary: bold terms, inline post-list expansion on click.
Multi-series navigation fixed: posts in multiple series (e.g., a post in both Highlights Tour and Worlding Raga) now show separate nav blocks per series.
Tag rollup: reduced vocabulary from ~160 tags to 100 (minimum count: 15) by manually merging low-count tags into their nearest covering tag.
Dead essay links in bibliography: 9 marked with "(dead link)"; 2 essays by Gregory Rider (onthespiral.com) noted as unrecoverable pending contact with the author.

April 2026, Sessions 10–11 — Content Fixes, Redirects, and History

Fixed 54 broken post-to-post links caused by a WordPress permalink artifact (doubled date paths like /2011/12/31/2011/12/31/slug/).
Fixed 276+ broken internal page links: added contact stub page, rewrote all /now-reading/ links to /bibliography/, documented 7 missing PDFs as a manual to-do.
Added /contact/ page (stub: "contact via venkateshrao.com"), resolving 40 inbound links.
Added /nfts/ page (restored from archive).
Added Refactor Camp section to /history/: a year-by-year narrative (2012–2019) and an 8-frame photo/poster slideshow using the same component as the Design History section.
Added redirects for all Refactor Camp year variants (/refactor-camp-2012/ through /refactor-camp-2019/) → /refactor-camp/.
Added redirect: /tempo/* → https://books.venkateshrao.com/tempo/.
Added bitrot warning box to 30 posts containing dead trailmeme links.
Stripped all Skeletor (the blog's former cat mascot) links and images from 4 posts; excluded the Skeletor page from the build.
Removed dead Livestream embed from one post; replaced with a "(no longer available)" note.

April 2026, Sessions 12–13 — Author Attribution and Link Cleanup

Author attribution: Resolved all 31 posts previously attributed to a generic "Guest" handle — each reassigned to the actual author's named handle, with display names added. The archive author dropdown was narrowed to contributors with 3 or more posts.
Author index: Built /authors/ page listing all 60 contributors with post counts, sorted by volume.
Tempo Blog merge: The 137 posts from the Tempo book blog (originally a companion site to the 2011 book, later folded into Ribbonfarm) were attributed to a separate "Tempo" handle. These were merged into Venkat's author page, bringing his total to 891 posts. Series identity is preserved via a static slug list so the Tempo Blog series page still works.
Trailmeme link cleanup: Trailmeme was a link-curation tool Venkat built at Xerox (2009–2011). Many early posts linked heavily into Trailmeme trails, all of which are now dead. Audited all 31 flagged posts: stripped dead links from 27 (keeping anchor text), leaving 4 with a bitrot warning where the linked content was too integral to remove cleanly. One post (Safar aur Musafir) had 13 song links replaced with verified YouTube URLs from official label channels.

Where We Are Now

Posts: 1,133 (2007–2024)
Static pages: 14
Series: 30 (+ index)
Authors: 60 (+ individual archive pages; Venkat: 891 posts)
Thematic clusters: 15
Glossary terms: 344
Tags: 100 (min 15 posts each)
Bibliography: 616 books, 95 papers, 79 essays