The missing code judgment layer for the AI era.
Understand structure. Extract design taste. Hunt security demons. Stop token waste.
Before AI, writing code and understanding code were the same act. Now AI writes the code. Writing and understanding have decoupled.
AI-generated code can introduce security holes, break your design philosophy, plant performance bombs, and silently burn your LLM budget. Catching these requires understanding the codebase's DNA.
That understanding can't come from lint. It can't come from tests. It can only come from human judgment.
Each skill builds on the last. Together they form a complete code judgment pipeline.
Build global understanding of a codebase — architecture, tech stack, entry points, dependency graph. Powered by scc + syft for deterministic analysis.
Extract design philosophy and key decisions. Find what's brilliant, what's a reasonable trade-off, and what deserves questioning. Rate decisions as 🔮 Elegant / ✅ Sound / ⚠️ Suspect / ❌ Anti-pattern.
Find real security vulnerabilities, CVEs, leaked secrets, performance traps. Hybrid architecture: bearer + trivy + gitleaks (deterministic) + Claude (semantic interpretation).
Discover token waste in LLM integrations — wallet black holes, attention pollution from bloated context, unnecessary input that degrades output quality.
Tools find problems deterministically. Claude explains why they matter with project context.
From library evaluation to AI code review, judgment is the gap tools don't fill.
Not just what it does, but what traps it hides. Know the design decisions, hidden CVEs, and dependency risks before you adopt.
Verify the AI didn't break your design philosophy, introduce security holes, or bury performance landmines that explode in production.
Build real structural understanding in minutes — architecture, entry points, design intent, known hazards — not just a surface-level tour.
Find where tokens are wasted, contexts bloated, and money burned. Get actionable recommendations for model routing and context optimization.
Are instructions ambiguous? Would a weaker model misinterpret them?
Are parallel agents truly independent? Hidden serial dependencies?
Prompt injection risks? Overly broad file system access?
What happens when an agent fails? Is there a fallback path?
Are phases well-structured? Any dead ends or information gaps?
Does it over-rely on one model's quirks?
One copy. One setup. Start judging.
cp -r skills/judge-the-code ~/.agents/skills/
~/.agents/skills/judge-the-code/setup