oumi.judges#
This module provides access to various judge configurations for the Oumi project.
The judges are used to evaluate the quality of AI-generated responses based on different criteria such as helpfulness, honesty, and safety.
- class oumi.judges.BaseJudge(prompt_template: str, prompt_template_placeholders: set[str] | None, system_instruction: str | None, example_field_values: list[dict[str, str]], response_format: JudgeResponseFormat, output_fields: list[JudgeOutputField], inference_engine: BaseInferenceEngine)[source]#
Bases:
object
Base class for implementing judges that evaluate model outputs.
A judge takes structured inputs, formats them using a prompt template, runs inference to get judgments, and parses the results into structured outputs.
- judge(inputs: list[dict[str, str]]) list[JudgeOutput] [source]#
Evaluate a batch of inputs and return structured judgments.
- Parameters:
inputs – List of dictionaries containing input data for evaluation. Each dict must contain values for all prompt_template placeholders.
- Returns:
List of structured judge outputs with parsed results
- Raises:
ValueError – If inference returns unexpected number of conversations
- class oumi.judges.JudgeOutput(*, raw_output: str, parsed_output: dict[str, str] = {}, output_fields: list[JudgeOutputField] | None = None, field_values: dict[str, float | int | str | bool | None] = {}, field_scores: dict[str, float | None] = {}, response_format: JudgeResponseFormat | None = None)[source]#
Bases:
BaseModel
Represents the output from a judge evaluation.
- Variables:
raw_output (str) – The original unprocessed output from the judge
parsed_output (dict[str, str]) – Structured data (fields & their values) extracted from raw output
output_fields (list[oumi.judges.base_judge.JudgeOutputField] | None) – List of expected output fields for this judge
field_values (dict[str, float | int | str | bool | None]) – Typed values for each expected output field
field_scores (dict[str, float | None]) – Numeric scores for each expected output field (if applicable)
response_format (oumi.core.configs.params.judge_params.JudgeResponseFormat | None) – Format used for generating output (XML, JSON, or RAW)
- field_scores: dict[str, float | None]#
- field_values: dict[str, float | int | str | bool | None]#
- classmethod from_raw_output(raw_output: str, response_format: JudgeResponseFormat, output_fields: list[JudgeOutputField]) Self [source]#
Generate a structured judge output from a raw model output.
- generate_raw_output(field_values: dict[str, str]) str [source]#
Generate raw output string from field values in the specified format.
- Parameters:
field_values – Dictionary mapping field keys to their string values. Must contain values for all required output fields.
- Returns:
Formatted raw output string ready for use as assistant response.
- Raises:
ValueError – If required output fields are missing from field_values, if response_format/output_fields are not set, or if response_format is not supported.
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- output_fields: list[JudgeOutputField] | None#
- parsed_output: dict[str, str]#
- raw_output: str#
- response_format: JudgeResponseFormat | None#
- class oumi.judges.JudgeOutputField(*, field_key: str, field_type: JudgeOutputType, field_scores: dict[str, float] | None)[source]#
Bases:
BaseModel
Represents a single output field that a judge can produce.
- Variables:
field_key (str) – The key/name for this field in the judge’s output
field_type (oumi.core.configs.params.judge_params.JudgeOutputType) – The data type expected for this field’s value
field_scores (dict[str, float] | None) – Optional mapping from categorical values to numeric scores
- field_key: str#
- field_scores: dict[str, float] | None#
- field_type: JudgeOutputType#
- get_typed_value(raw_value: str) float | int | str | bool | None [source]#
Convert the field’s raw string value to the appropriate type.
- Parameters:
raw_value – The raw string value from the judge’s output
- Returns:
The typed value, or None if conversion fails
- Raises:
ValueError – If the field_type is not supported
- model_config: ClassVar[ConfigDict] = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class oumi.judges.SimpleJudge(judge_config: JudgeConfig | str)[source]#
Bases:
BaseJudge
Judge class for evaluating outputs based on a given configuration.