Agent run rubrics

Agent run rubrics is a small local review layer for Hermes and Mads: it checks AI agent runs against their artifacts, test results, and stated scope so a finished run is not mistaken for verified work.

The useful question is not “did the agent finish?” It is “what did we actually verify?”

What it reviews

The first version covers three recurring workflows:

Transcript learning — video or article extractions that might feed Hermes knowledge.
Code review — repo inspections where findings need evidence, scope, and verification notes.
OpenClaw worker runs — Mads/Claude artifacts where top-level success can still hide failed tests, plan-only output, or stale state.

What it checks

The helper records a rubric verdict, hard gates, input metadata, and optional improvement proposals. For worker runs, it forces checks such as whether result.json was read, whether tests failed, whether changed files match the task, and whether the run was review-only or implementation-ready.

That distinction matters. A review can complete cleanly without producing a patch. A patch can look useful while tests fail. A plan can be valuable while still not being code.

What it does not do

It is deliberately mechanical. No model calls. No automatic memory writes. No social posting. No deployment decisions. No merge approval.

External-facing autonomous agents still default to draft and review. Full-yolo sending or posting needs an explicit low-risk allowlist.

Why it exists

Hermes and Mads are becoming useful because they are getting more artifact-based, not more theatrical. The rubric layer is a small piece of that: a repeatable way to ask whether a run is grounded, verified, and safe to act on.

The score is secondary. The hard gates are the point.