4 skills shipped · MIT License · Open Source

AI writes code.
You judge it.

The missing code judgment layer for the AI era.
Understand structure. Extract design taste. Hunt security demons. Stop token waste.

Get Started → See How It Works

AI makes code run.
Running ≠ Good.

Before AI, writing code and understanding code were the same act. Now AI writes the code. Writing and understanding have decoupled.

AI-generated code can introduce security holes, break your design philosophy, plant performance bombs, and silently burn your LLM budget. Catching these requires understanding the codebase's DNA.

That understanding can't come from lint. It can't come from tests. It can only come from human judgment.

🔓 SQL injection — hidden in AI helper function
💣 N+1 query — explodes at 100k users
🔑 API key in source — committed to git history
🧱 Design philosophy broken — AI ignored your patterns
🔥 Token black hole — 10x cost, nobody noticed
⚠️ CVE in dependency — 2 years old, still shipped

Four layers of judgment

Each skill builds on the last. Together they form a complete code judgment pipeline.

🔍

Structure layer

code-explore

Build global understanding of a codebase — architecture, tech stack, entry points, dependency graph. Powered by scc + syft for deterministic analysis.

✓ Ready

🎨

Taste layer

design-lens

Extract design philosophy and key decisions. Find what's brilliant, what's a reasonable trade-off, and what deserves questioning. Rate decisions as 🔮 Elegant / ✅ Sound / ⚠️ Suspect / ❌ Anti-pattern.

✓ Ready

👹

Judgment layer

demon-hunter

Find real security vulnerabilities, CVEs, leaked secrets, performance traps. Hybrid architecture: bearer + trivy + gitleaks (deterministic) + Claude (semantic interpretation).

✓ Ready

💰

Economy layer

token-optimize

Discover token waste in LLM integrations — wallet black holes, attention pollution from bloated context, unnecessary input that degrades output quality.

✓ Ready

Hybrid architecture

Tools find problems deterministically. Claude explains why they matter with project context.

Skill Layer (Claude)

code-explore ← scc, syft

design-lens ← source sampling

demon-hunter ← bearer, trivy, gitleaks

token-optimize ← static call-site analysis

Tool Layer (Deterministic)

scc — code metrics & line counts

syft — SBOM & dependency graph

bearer — SAST data flow analysis

trivy — CVE vulnerability database

gitleaks — secret detection in git

Built for real workflows

From library evaluation to AI code review, judgment is the gap tools don't fill.

📦

Evaluating a new library

Not just what it does, but what traps it hides. Know the design decisions, hidden CVEs, and dependency risks before you adopt.

🤖

Reviewing AI-generated code

Verify the AI didn't break your design philosophy, introduce security holes, or bury performance landmines that explode in production.

🏗️

Onboarding to a codebase

Build real structural understanding in minutes — architecture, entry points, design intent, known hazards — not just a surface-level tour.

💸

Auditing LLM costs

Find where tokens are wasted, contexts bloated, and money burned. Get actionable recommendations for model routing and context optimization.

Coming next: skill-review

A new frontier — quality review for Skill/Prompt engineering projects, where traditional code tools are useless.

🎯 Prompt Clarity

Are instructions ambiguous? Would a weaker model misinterpret them?

🔀 Agent Orchestration

Are parallel agents truly independent? Hidden serial dependencies?

🛡️ Security Boundaries

Prompt injection risks? Overly broad file system access?

🔄 Fault Tolerance

What happens when an agent fails? Is there a fallback path?

📐 Execution Flow

Are phases well-structured? Any dead ends or information gaps?

🔌 Model Portability

Does it over-rely on one model's quirks?

Up and running in 2 minutes

One copy. One setup. Start judging.

Clone the monorepo

git clone <repo-url> && cd judge-the-code

Run setup (downloads scan tools)

./setup

$ /code-explore . # understand the structure

$ /design-lens . # extract design philosophy

$ /demon-hunter . # hunt for demons

$ /token-optimize . # find token waste

✓ 4 reports generated → .judge-the-code/

$ view . # open visual dashboard

View on GitHub →

AI writes code. You judge it.

AI makes code run.Running ≠ Good.

Four layers of judgment

code-explore

design-lens

demon-hunter

token-optimize

Hybrid architecture

Skill Layer (Claude)

Tool Layer (Deterministic)

Built for real workflows

Evaluating a new library

Reviewing AI-generated code

Onboarding to a codebase

Auditing LLM costs

Coming next: skill-review

🎯 Prompt Clarity

🔀 Agent Orchestration

🛡️ Security Boundaries

🔄 Fault Tolerance

📐 Execution Flow

🔌 Model Portability

Up and running in 2 minutes

Clone the monorepo

Run setup (downloads scan tools)

AI writes code.
You judge it.

AI makes code run.
Running ≠ Good.