Open-source agent quality index
The Codecov for markdown AI agents.
Lint, eval, score, and ship markdown-defined agents for Claude Code, Codex, GitHub Copilot, Cursor, and Windsurf. Zero config. Deterministic replay. Public leaderboard.
# Local CLI usage (npm publish pending) node packages/cli/dist/bin/subagent-evals.js lint . # Public submission to the leaderboard node packages/cli/dist/bin/subagent-evals.js submit --public --owner you --repo your-repo
Static
Nine quality dimensions
Trigger clarity, tool policy, adversarial resilience, secret handling, and more.
Runtime
Replay by default
Deterministic CI. Opt-in live runs against Claude, OpenAI, and Anthropic runners.
Distribution
PR badges + diffs
GitHub Action posts sticky PR comments with score deltas and new findings.
Top public repositories
Ranked by overall score. Updated on every submission.
No public submissions yet.
Be the first — run node packages/cli/dist/bin/subagent-evals.js submit --public.
Supported formats
Claude Code
.claude/agents/*.md
OpenAI Codex
.codex/agents · AGENTS.md
GitHub Copilot
.github/copilot-instructions.md
Cursor
.cursor/rules/*.mdc
Windsurf
.windsurf/rules/*.md
Generic
YAML frontmatter *.md