BUILD
Output review
Output review scores the result of a deliberation session against a rubric and returns a structured report: a single float score, a list of annotated findings, and the full deliberation trail. It tells you not just how well the session performed, but which persona's reasoning drove the result and where the panel disagreed.
ADVISORY
Output review runs outside the deliberation loop by default — it scores a completed result and does not block or retry sessions. Configure it as a gate in your workflow recipe if you need automatic re-run on low scores.
What output review measures
Output review evaluates three dimensions of deliberation quality:
- Reasoning coverage — whether all relevant constraints were engaged by at least one persona
- Synthesis coherence — whether the final output is consistent with the deliberation trail and does not contradict positions the panel held
- Dissent preservation — whether minority positions were retained in the output rather than silently overwritten by the synthesis step
Running output review
Pass a completed session result and a rubric name to the OutputReviewer. The default rubric covers the three standard dimensions; swap it for a custom rubric if your workflow requires different weighting.
from verdaca import OutputReviewer
reviewer = OutputReviewer(session)
report = reviewer.evaluate(
result,
rubric="standard-deliberation",
)
print(report.score) # float 0.0–1.0
print(report.notes) # list[str]
print(report.audit_trail) # full deliberation log
Interpreting scores
Scores reflect the proportion of rubric criteria the session satisfied at or above their defined thresholds. A low score identifies which criteria failed, not just that something went wrong.
| Score range | Interpretation |
|---|---|
| 0.90–1.00 | All major criteria met; output is ready for practitioner review without qualification |
| 0.75–0.89 | Minor gaps in coverage or dissent preservation; review the annotated findings before use |
| 0.60–0.74 | Partial coverage; at least one dimension scored below threshold — re-run with additional rounds or review the persona constraints |
| < 0.60 | Output should not be used without a manual review of the full deliberation trail; consider revising the workflow or persona configuration |
CAUTION
All published Verdaca score benchmarks — including the ~$2.50 / ~12 min per-session figure (internal scoring; A4 deferred) — carry the A4-deferred status until Stage 7 ratification; do not cite them as externally validated metrics.
Custom rubrics
Define a custom rubric by specifying named criteria and their relative weights. Weights must sum to 1.0 exactly; the reviewer will raise a validation error at rubric creation time if they do not.
from verdaca.review import Rubric, Criterion
rubric = Rubric(
name="investment-memo-review",
criteria=[
Criterion(name="evidence-quality", weight=0.4),
Criterion(name="assumption-explicitness", weight=0.3),
Criterion(name="risk-identification", weight=0.3),
],
)
report = reviewer.evaluate(result, rubric=rubric)
IMPORTANT
Criterion weights must sum exactly to 1.0; the rubric constructor validates this at creation time and will not produce a rubric object if the sum is off.