How to Build a Minimal AI Agent
A tutorial on building a simple AI agent from scratch for software engineering and terminal tasks. Demonstrates that effective agents don't require complex frameworks: just a loop of prompt, action proposal, execution, and feedback.
Core Architecture
Agents operate as loops: prompt, action proposal, execution, feedback, repeat. The basic implementation requires only three components: querying a language model, parsing model output into actionable commands, and executing those commands.
LM Integration
Implementations for multiple providers: OpenAI, Anthropic, OpenRouter, LiteLLM, and Zhipu AI. Each shows minimal setup code (typically 8-15 lines). Environment variables recommended for API keys.
Action Parsing
Two encoding methods: markdown-style triple backticks with bash-action markers, or XML-style tags. Both use regex patterns to extract commands from model output. The key insight is keeping the format simple enough that the model rarely fails to produce valid output.
Command Execution
Uses Python's subprocess.run() with specific parameters to capture output, handle timeouts (30 seconds), and manage error streams. Each command runs separately via subprocess, enabling easy sandboxing and scaling.
Robustness Improvements
Exception handling passes errors back to the model for recovery. Format validation ensures proper command structure. Environment variable configuration prevents interactive tool hangs. These small additions dramatically improve reliability without adding complexity.
Production Application
The mini-swe-agent project implements these principles, achieving approximately 74% performance on SWE-bench verified tasks while maintaining relative simplicity. The lesson: you don't need a framework to build a capable agent. A loop, a model, and bash are enough.