oumi.datasets.grpo#

GRPO datasets module.

class oumi.datasets.grpo.BerryBenchGrpoDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#

Bases: BaseExperimentalGrpoDataset

Dataset class for the oumi-ai/berrybench-v0.1.1 dataset.

A sample from the dataset:

{
    "messages": [
        {
            "content": "Return a JSON object showing the frequency of each character in the word '黒い'. Only include characters that appear in the word.",
            "role": "user",
        }
    ],
    "metadata": {
        "character_count": 2,
        "difficulty": 3,
        "expected_response": '{"\\u9ed2": 1, "\\u3044": 1}',
        "language": "japanese",
        "word": "黒い",
    },
}
dataset_name: str#
default_dataset: str | None = 'oumi-ai/berrybench-v0.1.1'#
transform(sample: Series) dict[source]#

Transform the sample into Python dict.

transform_conversation(sample: Series) Conversation[source]#

Converts the input sample to a Conversation.

Parameters:

sample (dict) – The input example.

Returns:

The resulting conversation.

Return type:

Conversation

trust_remote_code: bool#
class oumi.datasets.grpo.CountdownGrpoDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#

Bases: BaseExperimentalGrpoDataset

Dataset class for the d1shs0ap/countdown dataset.

A sample from the dataset: {“target”: 87, “nums”: [79, 8]}

dataset_name: str#
default_dataset: str | None = 'd1shs0ap/countdown'#
transform(sample: Series) dict[source]#

Validate and transform the sample into Python dict.

transform_conversation(sample: Series) Conversation[source]#

Validate and transform the sample into Python dict.

trust_remote_code: bool#
class oumi.datasets.grpo.Gsm8kGrpoDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#

Bases: BaseExperimentalGrpoDataset

Dataset class for the openai/gsm8k dataset.

A sample from the dataset:

{
    "question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
    "answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.
               Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.
               #### 72"
}
dataset_name: str#
default_dataset: str | None = 'openai/gsm8k'#
transform(sample: Series) dict[source]#

Validate and transform the sample into Python dict.

transform_conversation(sample: Series) Conversation[source]#

Validate and transform the sample into Python dict.

trust_remote_code: bool#
class oumi.datasets.grpo.LetterCountGrpoDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#

Bases: BaseExperimentalGrpoDataset

Dataset class for the oumi-ai/oumi-letter-count dataset.

A sample from the dataset:

{
    "conversation_id": "oumi_letter_count_0",
    "messages": [
        {
            "content": "Can you let me know how many 'r's are in 'pandered'?",
            "role": "user",
        }
    ],
    "metadata": {
        "letter": "r",
        "letter_count_integer": 1,
        "letter_count_string": "one",
        "unformatted_prompt": "Can you let me know how many {letter}s are in {word}?",
        "word": "pandered",
    },
}
dataset_name: str#
default_dataset: str | None = 'oumi-ai/oumi-letter-count'#
transform(sample: Series) dict[source]#

Validate and transform the sample into Python dict.

transform_conversation(sample: Series) Conversation[source]#

Converts the input sample to a Conversation.

Parameters:

sample (dict) – The input example.

Returns:

The resulting conversation.

Return type:

Conversation

trust_remote_code: bool#
class oumi.datasets.grpo.RaRMedicineDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#

Bases: BaseRubricDataset

Dataset for RaR-Medicine from the Rubrics as Rewards paper.

This dataset contains 22.4k medical prompts with structured rubric annotations for training with GRPO. The prompts focus on complex medical reasoning tasks like diagnosis (50.3%) and treatment (16.0%).

HuggingFace: https://huggingface.co/datasets/anisha2102/RaR-Medicine

Example

>>> dataset = RaRMedicineDataset(split="train")
>>> sample = dataset.raw(0)
>>> print(sample["prompt"])
>>> print(sample["rubrics"])  # List of weighted rubric dicts
The rubrics follow this structure:
{

“name”: “Identify Most Sensitive Modality”, “description”: “Essential Criteria: Identifies non-contrast helical CT…”, “weight”: 5, “evaluation_type”: “binary”

}

dataset_name: str#
default_dataset: str | None = 'anisha2102/RaR-Medicine'#
transform(sample: Series) dict[str, Any][source]#

Transform a sample into the format expected by GRPO trainer.

trust_remote_code: bool#
class oumi.datasets.grpo.RaRScienceDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#

Bases: RaRMedicineDataset

Dataset for RaR-Science from the Rubrics as Rewards paper.

This dataset contains 22.9k expert-level science prompts with structured rubric annotations for training with GRPO. The prompts are aligned with the GPQA Diamond benchmark, covering topics from quantum mechanics to molecular biology.

HuggingFace: https://huggingface.co/datasets/anisha2102/RaR-Science

Example

>>> dataset = RaRScienceDataset(split="train")
>>> sample = dataset.raw(0)
>>> print(sample["prompt"])
>>> print(sample["rubrics"])  # List of weighted rubric dicts
The rubrics follow this structure:
{

“name”: “Temperature Conversion”, “description”: “Essential Criteria: The response must mention…”, “weight”: 5, “evaluation_type”: “binary”

}

dataset_name: str#
default_dataset: str | None = 'anisha2102/RaR-Science'#
trust_remote_code: bool#
class oumi.datasets.grpo.RlvrRubricDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#

Bases: BaseRubricDataset

Dataset for RLVR training with rubric-based rewards.

Expects input in rubric format:
  • prompt: str

  • rubrics: list of {name, description, weight}

  • system_prompt: str (optional)

  • metadata: dict (optional)

dataset_name: str#
default_dataset: str | None = 'oumi-rlvr-rubric'#
transform(sample: Series) dict[str, Any][source]#

Transform the sample into the format expected by GRPO trainer.

trust_remote_code: bool#
class oumi.datasets.grpo.TldrGrpoDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#

Bases: BaseExperimentalGrpoDataset

Dataset class for the trl-lib/tldr dataset.

dataset_name: str#
default_dataset: str | None = 'trl-lib/tldr'#
trust_remote_code: bool#

Subpackages#