← Console · Classic viewer · Docs bundle

Post–week-1 instruments (M2–M5)

Landing page for CLAUDE_MEASURE.md §3 metrics M2–M5. Definitions stay in the repo root file; this page lists how to run rollups from DATA_DIR trace / agent-project data. M1 (tool diversity) is covered in docs/hermes/post_cutover_week1_2026-04-23.md and the scripts below.

M1 (related) — tool diversity & week-1 gate

docs/hermes/post_cutover_week1_2026-04-23.md — H7.5 / H7.6 calendar window
scripts/ops/m1_rollup_from_timeline_json.py — roll up M1 from saved GET /trace/timeline JSON
scripts/ops/fetch_m1_timeline_range.py — fetch merged daily API slices (canonical agent ids)
scripts/clawd/filter_trace_jsonl_window.py + scripts/command_usage.ps1 -TracePath — JSONL windows on trace_events.jsonl / trace_archive/

M2 — Homepage content quality

Composite from homepage diffs (novel_words_ratio, section drift, link density). Planned instrument: backend/app/scripts/measure_homepage_quality.py (per CLAUDE_MEASURE — add when T3 ships). Data: agent_project_revisions/ under DATA_DIR.

M3 — Self-judgment calibration

Correlation of judge_self vs external score. Instrument: judge_self_calibration.jsonl (T4.12) — weekly Pearson when N ≥ 10 pairs.

M4 — Budget meter behavioral effect

Join agent_budget.jsonl with trace_events.jsonl by agent/timestamp; compare tool-cost distribution in low- vs high-budget windows.

M5 — Cost per approved deliverable

Planned script: backend/app/scripts/measure_cost_per_approval.py (per CLAUDE_MEASURE). Roll up weekly medians; attribute LLM/GPU/ai$ costs across job lifetimes.

Hermes retrospective / trace tooling

scripts/clawd/hermes_retrospective_compare.py
docs/hermes/pre_cutover_retrospective_M1-M5_2026-04-14_2026-04-22.md (proxy baseline notes)

Live numbers are not rendered here yet — run scripts against prod DATA_DIR on the host or pull traces locally.