The place for keeping up with all things AI.
Curated by Thomas Faulds
AI Glossary →A practical framework for evaluating AI agents in production. Distinguishes between benchmark-maximizing and floor-raising approaches, arguing most product teams should adopt floor raising — detective work reviewing actual user interactions, identifying failure patterns, and systematically preventing regressions.
A novel technique for async reinforcement learning that reduces per-step model weight transfer from terabytes to megabytes by exploiting bf16 sparsity. Routes only changed weight elements through Hugging Face Buckets to inference servers, enabling fully disaggregated training across distributed machines.
A comprehensive guide exploring Claude Code 2.0 and Opus 4.5, covering the evolution of the tool, practical workflows, and advanced features like sub-agents and context engineering. Shares strategies for maximizing productivity with AI coding agents.
Thread from the official Claude Devs account sharing practical tips, patterns, and workflows for getting the most out of Claude Code.
Data analytics is a uniquely cursed domain for agents. Easy questions look hard, hard questions look easy, many are impossible to answer. Hex built custom eval infrastructure called The Shoebox and an entire fake business to test their agents against realistic data.
Browser agents have an amnesia problem. They re-discover every site from scratch on every run, paying the full discovery tax forever. Autobrowse fixes that by letting an agent iterate on a real task until it converges, then graduating the winning approach into a durable, reusable skill.
A tutorial on building a simple AI agent from scratch for software engineering and terminal tasks. Demonstrates that effective agents don't require complex frameworks: just a loop of prompt, action proposal, execution, and feedback.
Pixel-perfect developer portfolio built with Next.js 16, Tailwind v4, and shadcn/ui. Includes a component registry, MDX content system, and clean design engineering. Great reference for modern portfolio sites.
Open protocol for AI agent registration. Standardized Markdown file hosted at an app's domain that lets agents register users without traditional sign-up forms. Supports agent-verified and user-claimed auth flows.
Extension that adds dynamic workflow tool to Pi. Model writes JavaScript scripts that fan out tasks across multiple isolated subagents for parallel execution, then consolidates results. Great for codebase audits and large refactors.
Connect AI agents to the browser. Automate web tasks with natural language through CDP. Also check out their Rust TUI for live browser agent control with task management, screenshots, and 2x cheaper/faster agent loop.
AI skills and agents that make each unit of engineering work easier than the last. 37 skills and 51 agents for brainstorming, planning, code review, and compounding learnings across Claude Code, Codex, and Cursor.
Open-source CodeRabbit alternative that runs in GitHub Actions. AI-powered bot that reviews PRs, fixes CI failures, resolves merge conflicts, and runs custom workflows. Model-agnostic and GitHub-native.
Secure, local-first trace capture and inspection for Codex and Claude agent sessions. Encrypts raw events at rest, builds searchable indexes, and serves a localhost dashboard for debugging agent runs.
AI agent skill that researches topics across Reddit, X, YouTube, TikTok, HN, Polymarket, and GitHub simultaneously, then ranks by real engagement and synthesizes into grounded briefs with citations.
Stealth headless browser server for AI agents. Wraps the Camoufox Firefox engine with C++ level anti-detection, fingerprint spoofing, and a REST API optimized for agent automation. Idles at ~40MB.
Lightweight inference engine for DeepSeek V4 on personal machines. Metal + CUDA backends, 2-bit quantization, 1M token context, compressed KV cache, and an integrated coding agent.
Local-first markdown editor for collaborating with AI agents on document review. CriticMarkup for inline comments and edits, MCP integration, no cloud dependency.
Collection of Claude Code skills for daily dev work: TDD, diagnosing bugs, codebase architecture, triaging issues, prototyping, converting specs to PRDs and GitHub issues, and more.
Build, run and scale AI agents like APIs. Open-source control plane with multi-language SDKs, cross-agent communication, human-in-the-loop, cryptographic identity, canary deployments, and built-in observability.
The 100-line AI agent that solves GitHub issues. Scores >74% on SWE-bench verified. Radically simple: bash-only tool, linear message history, no huge configs. By the Princeton/Stanford team behind SWE-bench.