The place for keeping up with all things AI.
Curated by Thomas Faulds
AI Glossary →Data analytics is a uniquely cursed domain for agents. Easy questions look hard, hard questions look easy, many are impossible to answer. Hex built custom eval infrastructure called The Shoebox and an entire fake business to test their agents against realistic data.
Browser agents have an amnesia problem. They re-discover every site from scratch on every run, paying the full discovery tax forever. Autobrowse fixes that by letting an agent iterate on a real task until it converges, then graduating the winning approach into a durable, reusable skill.
A tutorial on building a simple AI agent from scratch for software engineering and terminal tasks. Demonstrates that effective agents don't require complex frameworks: just a loop of prompt, action proposal, execution, and feedback.
A practical framework for evaluating AI agents in production. Distinguishes between benchmark-maximizing and floor-raising approaches, arguing most product teams should adopt floor raising — detective work reviewing actual user interactions, identifying failure patterns, and systematically preventing regressions.
Connect AI agents to the browser. Automate web tasks with natural language through CDP. Also check out their Rust TUI for live browser agent control with task management, screenshots, and 2x cheaper/faster agent loop.
AI skills and agents that make each unit of engineering work easier than the last. 37 skills and 51 agents for brainstorming, planning, code review, and compounding learnings across Claude Code, Codex, and Cursor.
Open-source CodeRabbit alternative that runs in GitHub Actions. AI-powered bot that reviews PRs, fixes CI failures, resolves merge conflicts, and runs custom workflows. Model-agnostic and GitHub-native.
Secure, local-first trace capture and inspection for Codex and Claude agent sessions. Encrypts raw events at rest, builds searchable indexes, and serves a localhost dashboard for debugging agent runs.
AI agent skill that researches topics across Reddit, X, YouTube, TikTok, HN, Polymarket, and GitHub simultaneously, then ranks by real engagement and synthesizes into grounded briefs with citations.
Stealth headless browser server for AI agents. Wraps the Camoufox Firefox engine with C++ level anti-detection, fingerprint spoofing, and a REST API optimized for agent automation. Idles at ~40MB.
Lightweight inference engine for DeepSeek V4 on personal machines. Metal + CUDA backends, 2-bit quantization, 1M token context, compressed KV cache, and an integrated coding agent.
Local-first markdown editor for collaborating with AI agents on document review. CriticMarkup for inline comments and edits, MCP integration, no cloud dependency.
Collection of Claude Code skills for daily dev work: TDD, diagnosing bugs, codebase architecture, triaging issues, prototyping, converting specs to PRDs and GitHub issues, and more.
The 100-line AI agent that solves GitHub issues. Scores >74% on SWE-bench verified. Radically simple: bash-only tool, linear message history, no huge configs. By the Princeton/Stanford team behind SWE-bench.