BUILD

Output review

Output review scores the result of a deliberation session against a rubric and returns a structured report: a single float score, a list of annotated findings, and the full deliberation trail. It tells you not just how well the session performed, but which persona's reasoning drove the result and where the panel disagreed.

ADVISORY

Output review runs outside the deliberation loop by default — it scores a completed result and does not block or retry sessions. Configure it as a gate in your workflow recipe if you need automatic re-run on low scores.

What output review measures

Output review evaluates three dimensions of deliberation quality:

Reasoning coverage — whether all relevant constraints were engaged by at least one persona
Synthesis coherence — whether the final output is consistent with the deliberation trail and does not contradict positions the panel held
Dissent preservation — whether minority positions were retained in the output rather than silently overwritten by the synthesis step

Running output review

Pass a completed session result and a rubric name to the OutputReviewer. The default rubric covers the three standard dimensions; swap it for a custom rubric if your workflow requires different weighting.

from verdaca import OutputReviewer

reviewer = OutputReviewer(session)
report   = reviewer.evaluate(
    result,
    rubric="standard-deliberation",
)

print(report.score)       # float 0.0–1.0
print(report.notes)       # list[str]
print(report.audit_trail) # full deliberation log

Interpreting scores

Scores reflect the proportion of rubric criteria the session satisfied at or above their defined thresholds. A low score identifies which criteria failed, not just that something went wrong.

Score range	Interpretation
0.90–1.00	All major criteria met; output is ready for practitioner review without qualification
0.75–0.89	Minor gaps in coverage or dissent preservation; review the annotated findings before use
0.60–0.74	Partial coverage; at least one dimension scored below threshold — re-run with additional rounds or review the persona constraints
< 0.60	Output should not be used without a manual review of the full deliberation trail; consider revising the workflow or persona configuration

CAUTION

All published Verdaca score benchmarks — including the ~$2.50 / ~12 min per-session figure (internal scoring; A4 deferred) — carry the A4-deferred status until Stage 7 ratification; do not cite them as externally validated metrics.

Custom rubrics

Define a custom rubric by specifying named criteria and their relative weights. Weights must sum to 1.0 exactly; the reviewer will raise a validation error at rubric creation time if they do not.

from verdaca.review import Rubric, Criterion

rubric = Rubric(
    name="investment-memo-review",
    criteria=[
        Criterion(name="evidence-quality", weight=0.4),
        Criterion(name="assumption-explicitness", weight=0.3),
        Criterion(name="risk-identification", weight=0.3),
    ],
)

report = reviewer.evaluate(result, rubric=rubric)

IMPORTANT

Criterion weights must sum exactly to 1.0; the rubric constructor validates this at creation time and will not produce a rubric object if the sum is off.