Open-source agent quality index

The Codecov for markdown AI agents.

Lint, eval, score, and ship markdown-defined agents for Claude Code, Codex, GitHub Copilot, Cursor, and Windsurf. Zero config. Deterministic replay. Public leaderboard.

Star on GitHub View leaderboard Quickstart
# Local CLI usage (npm publish pending)
node packages/cli/dist/bin/subagent-evals.js lint .

# Public submission to the leaderboard
node packages/cli/dist/bin/subagent-evals.js submit --public --owner you --repo your-repo

Static

Nine quality dimensions

Trigger clarity, tool policy, adversarial resilience, secret handling, and more.

Runtime

Replay by default

Deterministic CI. Opt-in live runs against Claude, OpenAI, and Anthropic runners.

Distribution

PR badges + diffs

GitHub Action posts sticky PR comments with score deltas and new findings.

Top public repositories

Ranked by overall score. Updated on every submission.

No public submissions yet.

Be the first — run node packages/cli/dist/bin/subagent-evals.js submit --public.


Supported formats

Claude Code

.claude/agents/*.md

OpenAI Codex

.codex/agents · AGENTS.md

GitHub Copilot

.github/copilot-instructions.md

Cursor

.cursor/rules/*.mdc

Windsurf

.windsurf/rules/*.md

Generic

YAML frontmatter *.md