I Forked 4 cli coding agents to Run the Same Model. The scaffolding explained a 2x gap.
Deep dive into the architecture of Codex, Gemini CLI, Mistral Vibe, and OpenCode. Same model, 2x performance gap — the scaffolding is what matters.
Blog
Thoughts on AI systems, engineering culture, and building things that work.
Deep dive into the architecture of Codex, Gemini CLI, Mistral Vibe, and OpenCode. Same model, 2x performance gap — the scaffolding is what matters.
Claude Code, Codex, Gemini CLI, and Mistral tackle a fiber network optimization problem. Claude Code beat my 8-year-old C++ solution by 62 points.
Building omniagents, a ~2000-line Python framework for multi-tenant AI coding agents, only to discover OpenHands already existed. Lessons on agent architecture, isolation, persistence, and costs.
We benchmarked our production RAG system across embedding models, chunk sizes, chunking strategies, and retrieval modes. The results contradicted common wisdom.
A live benchmark that tests AI models' ability to predict real-world events through prediction markets. Every day, we let AI models bet $1 on top events from Polymarket.
We're brilliant engineers — yet we wrestle with Word docs, Excel sheets, endless email chains. Time to reclaim our craft.