Run a live eval.
See what breaks before merge.

Pick a scenario, run the comparison, share the result.

Scenario

5 runs per minute per IP, enforced server-side.

Run the eval to see baseline vs branch scores, gate decisions, and a shareable result URL.