Portfolio

Projects

Making AI work for complex engineering and technical problems.

CLIArena

Benchmarking CLI coding agents — and forking them

Read the codebases of Codex, Gemini CLI, Mistral Vibe, and OpenCode, then forked three of them to run GLM-4.7 on Terminal-Bench 2.0. Same model, 2x performance gap — the scaffolding is what matters. Also benchmarked all four agents on an unpublished NP-hard optimization problem; Claude Code beat my 8-year-old C++ solution. Forks now updated to support GLM-5.

Python, Rust, TypeScript, Docker, Harbor

OmniAgents

Unified interface for AI coding agents

Unified interface for AI coding agents across execution environments (Local, Docker, E2B) and frameworks (smolagents, Pydantic-AI, LangChain). No longer actively developed — the exploration led to rebuilding OpenHands from scratch to understand agent internals.

Python, Docker, E2B

Predibench

Benchmark AI on real-world prediction markets

Benchmark AI models on real-world prediction markets. Live platform testing if AI can beat humans at forecasting.

Python, Polymarket API, RAG

Jimmy Energy

Head of Software — Comex member (2022-2025)

Engineering-as-Code transformation

As a director and Comex member, built and led the software team that transformed a traditional engineering company to a Git-based workflow. Replaced legacy PLM with custom Python tools (PyJimmy). Entire engineering team now works from unified codebase with version control, CI/CD, and AI integration.

Impact: Engineers spend time engineering instead of managing files. Clean, versioned data enables AI workflows.

Python, Git, AWS, GitHub Actions

Webportal

Web browsing for AI agents via VLM parsing

A web parser using a VLM to analyze pages and backend requests, providing a digested format to LLMs for autonomous web browsing. Built at HuggingFace x Anthropic hackathon (3rd place).

Firebase, Web APIs, VLM

HuggingFace x Anthropic Hackathon — Travel Booking Agent

AI agent that books real travel through a browser (3rd place)

A travel booking agent using smolagents and browser-use. Won 3rd place at the HuggingFace x Anthropic hackathon.

Python, smolagents, browser-use

DeepDraft

Forcing AI agents to follow scientific reasoning

The objective was to force agents to follow a rigorous scientific reasoning process to answer questions. If rebuilt today, would be based on an open-source CLI agent like Mistral Vibe or Codex.

Python, RAG, LLM agents

Pyforge

Minimalist Python library for engineering-as-code

Version control for engineering artifacts — models, simulations, docs. An example of engineering-as-code: treating engineering data with the same rigor as software.

Python, Git

AIEngineer

AI agent for engineering project scaffolding

My first project with AI agents. Used Aider to programmatically generate code for engineering projects. With today's knowledge, would simply build a tool-using agent directly instead of relying on Aider's approach.

Python, Aider, Pyforge