oumi.datasets.grpo#
GRPO datasets module.
- class oumi.datasets.grpo.BerryBenchGrpoDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#
Bases:
BaseExperimentalGrpoDatasetDataset class for the oumi-ai/berrybench-v0.1.1 dataset.
A sample from the dataset:
{ "messages": [ { "content": "Return a JSON object showing the frequency of each character in the word '黒い'. Only include characters that appear in the word.", "role": "user", } ], "metadata": { "character_count": 2, "difficulty": 3, "expected_response": '{"\\u9ed2": 1, "\\u3044": 1}', "language": "japanese", "word": "黒い", }, }
- dataset_name: str#
- default_dataset: str | None = 'oumi-ai/berrybench-v0.1.1'#
- transform_conversation(sample: Series) Conversation[source]#
Converts the input sample to a Conversation.
- Parameters:
sample (dict) – The input example.
- Returns:
The resulting conversation.
- Return type:
- trust_remote_code: bool#
- class oumi.datasets.grpo.CountdownGrpoDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#
Bases:
BaseExperimentalGrpoDatasetDataset class for the d1shs0ap/countdown dataset.
A sample from the dataset: {“target”: 87, “nums”: [79, 8]}
- dataset_name: str#
- default_dataset: str | None = 'd1shs0ap/countdown'#
- transform_conversation(sample: Series) Conversation[source]#
Validate and transform the sample into Python dict.
- trust_remote_code: bool#
- class oumi.datasets.grpo.Gsm8kGrpoDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#
Bases:
BaseExperimentalGrpoDatasetDataset class for the openai/gsm8k dataset.
A sample from the dataset:
{ "question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?", "answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May. Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May. #### 72" }
- dataset_name: str#
- default_dataset: str | None = 'openai/gsm8k'#
- transform_conversation(sample: Series) Conversation[source]#
Validate and transform the sample into Python dict.
- trust_remote_code: bool#
- class oumi.datasets.grpo.LetterCountGrpoDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#
Bases:
BaseExperimentalGrpoDatasetDataset class for the oumi-ai/oumi-letter-count dataset.
A sample from the dataset:
{ "conversation_id": "oumi_letter_count_0", "messages": [ { "content": "Can you let me know how many 'r's are in 'pandered'?", "role": "user", } ], "metadata": { "letter": "r", "letter_count_integer": 1, "letter_count_string": "one", "unformatted_prompt": "Can you let me know how many {letter}s are in {word}?", "word": "pandered", }, }
- dataset_name: str#
- default_dataset: str | None = 'oumi-ai/oumi-letter-count'#
- transform_conversation(sample: Series) Conversation[source]#
Converts the input sample to a Conversation.
- Parameters:
sample (dict) – The input example.
- Returns:
The resulting conversation.
- Return type:
- trust_remote_code: bool#
- class oumi.datasets.grpo.RaRMedicineDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#
Bases:
BaseRubricDatasetDataset for RaR-Medicine from the Rubrics as Rewards paper.
This dataset contains 22.4k medical prompts with structured rubric annotations for training with GRPO. The prompts focus on complex medical reasoning tasks like diagnosis (50.3%) and treatment (16.0%).
HuggingFace: https://huggingface.co/datasets/anisha2102/RaR-Medicine
Example
>>> dataset = RaRMedicineDataset(split="train") >>> sample = dataset.raw(0) >>> print(sample["prompt"]) >>> print(sample["rubrics"]) # List of weighted rubric dicts
- The rubrics follow this structure:
- {
“name”: “Identify Most Sensitive Modality”, “description”: “Essential Criteria: Identifies non-contrast helical CT…”, “weight”: 5, “evaluation_type”: “binary”
}
- dataset_name: str#
- default_dataset: str | None = 'anisha2102/RaR-Medicine'#
- transform(sample: Series) dict[str, Any][source]#
Transform a sample into the format expected by GRPO trainer.
- trust_remote_code: bool#
- class oumi.datasets.grpo.RaRScienceDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#
Bases:
RaRMedicineDatasetDataset for RaR-Science from the Rubrics as Rewards paper.
This dataset contains 22.9k expert-level science prompts with structured rubric annotations for training with GRPO. The prompts are aligned with the GPQA Diamond benchmark, covering topics from quantum mechanics to molecular biology.
HuggingFace: https://huggingface.co/datasets/anisha2102/RaR-Science
Example
>>> dataset = RaRScienceDataset(split="train") >>> sample = dataset.raw(0) >>> print(sample["prompt"]) >>> print(sample["rubrics"]) # List of weighted rubric dicts
- The rubrics follow this structure:
- {
“name”: “Temperature Conversion”, “description”: “Essential Criteria: The response must mention…”, “weight”: 5, “evaluation_type”: “binary”
}
- dataset_name: str#
- default_dataset: str | None = 'anisha2102/RaR-Science'#
- trust_remote_code: bool#
- class oumi.datasets.grpo.RlvrRubricDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#
Bases:
BaseRubricDatasetDataset for RLVR training with rubric-based rewards.
- Expects input in rubric format:
prompt: str
rubrics: list of {name, description, weight}
system_prompt: str (optional)
metadata: dict (optional)
- dataset_name: str#
- default_dataset: str | None = 'oumi-rlvr-rubric'#
- transform(sample: Series) dict[str, Any][source]#
Transform the sample into the format expected by GRPO trainer.
- trust_remote_code: bool#
- class oumi.datasets.grpo.TldrGrpoDataset(*, dataset_name: str | None = None, dataset_path: str | None = None, split: str | None = None, **kwargs)[source]#
Bases:
BaseExperimentalGrpoDatasetDataset class for the trl-lib/tldr dataset.
- dataset_name: str#
- default_dataset: str | None = 'trl-lib/tldr'#
- trust_remote_code: bool#