Making AI work for complex engineering and technical problems.
CLIArena
Benchmarking CLI coding agents — and forking them
Read the codebases of Codex, Gemini CLI, Mistral Vibe, and OpenCode, then forked three of them to run GLM-4.7 on Terminal-Bench 2.0. Same model, 2x performance gap — the scaffolding is what matters. Also benchmarked all four agents on an unpublished NP-hard optimization problem; Claude Code beat my 8-year-old C++ solution. Forks now updated to support GLM-5.
Python, Rust, TypeScript, Docker, Harbor
OmniAgents
Unified interface for AI coding agents
Unified interface for AI coding agents across execution environments (Local, Docker, E2B) and frameworks (smolagents, Pydantic-AI, LangChain). No longer actively developed — the exploration led to rebuilding OpenHands from scratch to understand agent internals.
Predibench
Benchmark AI on real-world prediction markets
Benchmark AI models on real-world prediction markets. Live platform testing if AI can beat humans at forecasting.
Python, Polymarket API, RAG
Jimmy Energy
Head of Software — Comex member (2022-2025)
Engineering-as-Code transformation
As a director and Comex member, built and led the software team that transformed a traditional engineering company to a Git-based workflow. Replaced legacy PLM with custom Python tools (PyJimmy). Entire engineering team now works from unified codebase with version control, CI/CD, and AI integration.
Impact: Engineers spend time engineering instead of managing files. Clean, versioned data enables AI workflows.
Python, Git, AWS, GitHub Actions
Webportal
Web browsing for AI agents via VLM parsing
A web parser using a VLM to analyze pages and backend requests, providing a digested format to LLMs for autonomous web browsing. Built at HuggingFace x Anthropic hackathon (3rd place).
HuggingFace x Anthropic Hackathon — Travel Booking Agent
AI agent that books real travel through a browser (3rd place)
A travel booking agent using smolagents and browser-use. Won 3rd place at the HuggingFace x Anthropic hackathon.
Python, smolagents, browser-use
DeepDraft
Forcing AI agents to follow scientific reasoning
The objective was to force agents to follow a rigorous scientific reasoning process to answer questions. If rebuilt today, would be based on an open-source CLI agent like Mistral Vibe or Codex.
Pyforge
Minimalist Python library for engineering-as-code
Version control for engineering artifacts — models, simulations, docs. An example of engineering-as-code: treating engineering data with the same rigor as software.
AIEngineer
AI agent for engineering project scaffolding
My first project with AI agents. Used Aider to programmatically generate code for engineering projects. With today's knowledge, would simply build a tool-using agent directly instead of relying on Aider's approach.