Judges (AI)

AI Judges let LLMs grade snapshots automatically using the same evaluation framework as human evaluators.

Supported Providers

Provider notes:

Keep grading criteria explicit and stable.
Require concise reason text.
Keep outputs machine-parseable and deterministic.
Enforce output language policy (for example, English-only reasons if required by your workflow).

Active judges can show:

Inactive judges do not show runtime status badges.

Judges page status filter uses segmented control:

API Key Storage: All provider API keys are securely encrypted at rest in the database and are never sent to the client side.
Costs (BYOK): You bring your own API keys, meaning you have complete transparency over costs and pay the LLM provider directly for the tokens you use. Token usage is logged per evaluation to help forecast expenses.
Data Privacy (Local LLMs): If your search data is highly confidential, you can use local, self-hosted LLMs (via ollama or custom_openai) so that your evaluation data never leaves your internal network.

Use judge logs for observability:

Judge Logs page supports:

global mode (/judges/logs) and per-judge mode (/judges/{judge}/logs)
filters by status/judge/evaluation/date
Export JSONL of the currently filtered dataset (includes request/response bodies)

This is the primary source for debugging provider, prompt, and parsing issues.