oumi.analyze#
Analyzer framework for dataset analysis.
- class oumi.analyze.AnalysisPipeline(analyzers: list[MessageAnalyzer[Any] | ConversationAnalyzer[Any] | DatasetAnalyzer[Any] | PreferenceAnalyzer[Any]], cache_dir: str | Path | None = None)[source]#
Bases:
objectPipeline for orchestrating multiple analyzers on a dataset.
The AnalysisPipeline manages running multiple analyzers on conversations, handling different analyzer scopes appropriately, and providing unified access to results.
Note
PreferenceAnalyzers are not run by run(). Use run_preference() separately to analyze preference pairs (chosen/rejected conversations).
Example
>>> from oumi.analyze import AnalysisPipeline, LengthAnalyzer >>> >>> pipeline = AnalysisPipeline( ... analyzers=[ ... LengthAnalyzer.from_config({"tokenizer_name": "cl100k_base"}) ... ], ... cache_dir="./analysis_cache", ... ) >>> results = pipeline.run(conversations)
- Parameters:
analyzers – List of analyzer instances to run.
cache_dir – Optional directory for caching results.
- property conversations: list[Conversation]#
Get the analyzed conversations.
- get_analyzer(name: str) MessageAnalyzer[Any] | ConversationAnalyzer[Any] | DatasetAnalyzer[Any] | PreferenceAnalyzer[Any] | None[source]#
Get an analyzer by name, or None if not found.
- load_cache() bool[source]#
Load results from cache directory.
Note
Loaded results are raw dictionaries, not Pydantic model instances. Use get_cached_result() to reconstruct typed results if needed, or access raw data directly via self.results.
- Returns:
True if cache was loaded successfully, False otherwise.
- property message_to_conversation_idx: list[int]#
Get the mapping from message index to conversation index.
- property results: dict[str, list[BaseModel] | BaseModel]#
Get the cached analysis results.
- run(conversations: list[Conversation]) dict[str, list[BaseModel] | BaseModel][source]#
Run all analyzers on the provided conversations.
Note
PreferenceAnalyzers are not run by this method. Use run_preference() separately to analyze preference pairs.
- Parameters:
conversations – List of conversations to analyze.
- Returns:
Dictionary mapping analyzer names to their results. - For ConversationAnalyzer: list of results (one per conversation) - For MessageAnalyzer: list of results (one per message) - For DatasetAnalyzer: single result for entire dataset
- run_preference(pairs: list[tuple[Conversation, Conversation]]) dict[str, list[BaseModel] | BaseModel][source]#
Run preference analyzers on conversation pairs.
- Parameters:
pairs – List of (chosen, rejected) conversation tuples.
- Returns:
Dictionary mapping analyzer names to their results.
- class oumi.analyze.AnalyzerConfig(id: str, instance_id: str, params: dict[str, ~typing.Any]=<factory>)[source]#
Bases:
objectConfiguration for a single analyzer instance.
Each analyzer has a type (id) and a unique instance name (instance_id). Multiple instances of the same type are supported (e.g. two length analyzers with different tokenizers).
- Variables:
id (str) – Analyzer type (registry id, e.g. “length”, “difficulty_judge”).
instance_id (str) – Unique instance name (always required). Used as the results key and in test metric paths.
params (dict[str, Any]) – Analyzer-specific parameters.
- id: str#
- instance_id: str#
- params: dict[str, Any]#
- class oumi.analyze.BaseAnalyzer[source]#
Bases:
ABC,Generic[TResult]Base class for all analyzer types.
Subclasses must implement metadata methods to describe their result schema. The generic type parameter TResult provides type safety for the analyze() method.
All concrete analyzer types (MessageAnalyzer, ConversationAnalyzer, etc.) inherit from this class. Set
_result_modelin subclasses to get automatic implementations ofget_result_schema,get_metric_names, andget_metric_descriptions.- Variables:
analyzer_id (str | None) – Optional custom identifier for this analyzer instance. If not set, the class name is used as the identifier.
- analyzer_id: str | None = None#
- get_available_metric_names() list[str][source]#
Get metric names this instance will actually produce.
Subclasses can override to exclude metrics that depend on instance config (e.g.,
rendered_tokensrequires a HuggingFace tokenizer).
- abstractmethod classmethod get_config_schema() dict[str, Any][source]#
Get JSON schema for this analyzer’s configuration.
- classmethod get_metric_descriptions() dict[str, str][source]#
Get descriptions for each metric field.
- class oumi.analyze.ConversationAnalyzer[source]#
Bases:
BaseAnalyzer[TResult]Base class for analyzers that operate on complete conversations.
- __call__(conversation: Conversation) TResult[source]#
Call analyze() directly.
- abstractmethod analyze(conversation: Conversation) TResult[source]#
Analyze a complete conversation and return typed results.
- Parameters:
conversation – The conversation to analyze.
- Returns:
Typed result model containing analysis metrics.
- analyze_batch(conversations: list[Conversation]) list[TResult][source]#
Analyze multiple conversations and return results for each.
Override this method to implement batched processing for better performance, especially for analyzers that benefit from batching (e.g., those using ML models).
- Parameters:
conversations – List of conversations to analyze.
- Returns:
List of typed results, one per conversation.
- static get_conversation_text(conversation: Conversation, tokenizer: PreTrainedTokenizerBase) str[source]#
Get the full text of a conversation using a tokenizer’s chat template.
- Parameters:
conversation – The conversation to extract text from.
tokenizer – Tokenizer with a chat template for formatting.
- Returns:
Full conversation text as a single string.
- Raises:
ValueError – If the tokenizer doesn’t have a chat template.
- class oumi.analyze.DataQualityAnalyzer[source]#
Bases:
ConversationAnalyzer[DataQualityMetrics]Analyzer for basic data quality checks on conversations.
Checks for three common data quality issues without requiring an LLM: - Non-alternating user/assistant message patterns - Empty or whitespace-only turns - Values serialized as strings (NaN, null, None, undefined)
Example
>>> from oumi.analyze.analyzers.quality import DataQualityAnalyzer >>> from oumi.core.types.conversation import Conversation, Message, Role >>> >>> analyzer = DataQualityAnalyzer() >>> conversation = Conversation(messages=[ ... Message(role=Role.USER, content="Hello"), ... Message(role=Role.ASSISTANT, content="Hi there!"), ... ]) >>> result = analyzer.analyze(conversation) >>> print(result.has_non_alternating_turns) False
- analyze(conversation: Conversation) DataQualityMetrics[source]#
Analyze data quality for a conversation.
- Parameters:
conversation – The conversation to analyze.
- Returns:
DataQualityMetrics with the quality check results.
- class oumi.analyze.DataQualityMetrics(*, has_non_alternating_turns: bool, has_empty_turns: bool, empty_turn_count: int, has_invalid_values: bool, invalid_value_patterns: list[str])[source]#
Bases:
BaseModelResult model for data quality checks on a conversation.
Example
>>> result = DataQualityMetrics( ... has_non_alternating_turns=False, ... has_empty_turns=False, ... empty_turn_count=0, ... has_invalid_values=False, ... invalid_value_patterns=[], ... ) >>> print(result.has_non_alternating_turns) False
- empty_turn_count: int#
- has_empty_turns: bool#
- has_invalid_values: bool#
- has_non_alternating_turns: bool#
- invalid_value_patterns: list[str]#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class oumi.analyze.DatasetAnalyzer[source]#
Bases:
BaseAnalyzer[TResult]Base class for analyzers that operate on entire datasets.
- __call__(conversations: list[Conversation]) TResult[source]#
Call analyze() directly.
- abstractmethod analyze(conversations: list[Conversation]) TResult[source]#
Analyze an entire dataset and return typed results.
This method receives all conversations at once, enabling cross-sample operations that require global context.
- Parameters:
conversations – All conversations in the dataset.
- Returns:
Typed result model containing dataset-level analysis.
- class oumi.analyze.LengthAnalyzer(tokenizer: Tokenizer | None = None)[source]#
Bases:
ConversationAnalyzer[LengthMetrics]Analyzer for computing token length metrics of conversations.
Computes token counts for conversations using a provided tokenizer. Provides both conversation-level totals and per-message breakdowns.
Example
>>> from oumi.analyze.analyzers.length import LengthAnalyzer >>> from oumi.core.types.conversation import Conversation, Message, Role >>> >>> analyzer = LengthAnalyzer.from_config({"tokenizer_name": "cl100k_base"}) >>> conversation = Conversation(messages=[ ... Message(role=Role.USER, content="Hello, how are you?"), ... Message(role=Role.ASSISTANT, content="I'm doing well, thanks!"), ... ]) >>> result = analyzer.analyze(conversation) >>> print(f"Total tokens: {result.total_tokens}") Total tokens: 12
- Parameters:
tokenizer – Tokenizer instance for token counting. Must have an encode(text) -> list method. Use from_config() to construct from a tokenizer name, or pass any compatible tokenizer directly.
- analyze(conversation: Conversation) LengthMetrics[source]#
Analyze token length metrics for a conversation.
- Parameters:
conversation – The conversation to analyze.
- Returns:
LengthMetrics containing token counts.
- analyze_text(text: str) LengthMetrics[source]#
Analyze token length metrics for a single text string.
Convenience method for analyzing text without creating a Conversation.
- Parameters:
text – The text to analyze.
- Returns:
LengthMetrics for the text (treated as a single message).
- classmethod from_config(config: dict[str, Any]) LengthAnalyzer[source]#
Create a LengthAnalyzer from a config dictionary.
- Parameters:
config – See
LengthAnalyzerConfigfor supported keys.- Returns:
LengthAnalyzer instance with configured tokenizer.
- class oumi.analyze.LengthAnalyzerConfig(*, tokenizer_name: str = 'cl100k_base', trust_remote_code: bool = False)[source]#
Bases:
BaseModelConfiguration for LengthAnalyzer.
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- tokenizer_name: str#
- trust_remote_code: bool#
- class oumi.analyze.LengthMetrics(*, total_tokens: int, rendered_tokens: int | None = None, avg_tokens_per_message: float, message_token_counts: list[int], num_messages: int, user_total_tokens: int = 0, assistant_total_tokens: int = 0, system_total_tokens: int = 0, tool_total_tokens: int = 0)[source]#
Bases:
BaseModelResult model for length analysis of conversations.
Example
>>> result = LengthMetrics( ... total_tokens=25, ... avg_tokens_per_message=12.5, ... message_token_counts=[10, 15], ... num_messages=2, ... ) >>> print(result.total_tokens) 25
- assistant_total_tokens: int#
- avg_tokens_per_message: float#
- message_token_counts: list[int]#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- num_messages: int#
- rendered_tokens: int | None#
- system_total_tokens: int#
- tool_total_tokens: int#
- total_tokens: int#
- user_total_tokens: int#
- class oumi.analyze.MessageAnalyzer[source]#
Bases:
BaseAnalyzer[TResult]Base class for analyzers that operate on individual messages.
- abstractmethod analyze(message: Message) TResult[source]#
Analyze a single message and return typed results.
- Parameters:
message – The message to analyze.
- Returns:
Typed result model containing analysis metrics.
- analyze_batch(messages: list[Message]) list[TResult][source]#
Analyze multiple messages and return results for each.
Override this method to implement vectorized/batched processing for better performance with large datasets.
- Parameters:
messages – List of messages to analyze.
- Returns:
List of typed results, one per message.
- class oumi.analyze.PreferenceAnalyzer[source]#
Bases:
BaseAnalyzer[TResult]Base class for analyzers that operate on preference pairs.
- __call__(chosen: Conversation, rejected: Conversation) TResult[source]#
Call analyze() directly.
- abstractmethod analyze(chosen: Conversation, rejected: Conversation) TResult[source]#
Analyze a preference pair and return typed results.
- Parameters:
chosen – The preferred/chosen conversation.
rejected – The rejected/dispreferred conversation.
- Returns:
Typed result model containing preference analysis.
- analyze_batch(pairs: list[tuple[Conversation, Conversation]]) list[TResult][source]#
Analyze multiple preference pairs.
- Parameters:
pairs – List of (chosen, rejected) conversation tuples.
- Returns:
List of typed results, one per pair.
- class oumi.analyze.TestEngine(tests: list[TestParams])[source]#
Bases:
objectEngine for running tests on typed analysis results.
Tests operate on typed Pydantic results, not DataFrames. This ensures tests are pure validation with no computation - all metrics must be pre-computed by analyzers.
Example
>>> from oumi.analyze.testing import TestEngine, TestParams, TestType >>> >>> tests = [ ... TestParams( ... id="max_words", ... type=TestType.THRESHOLD, ... metric="length.total_tokens", ... operator=">", ... value=10000, ... max_percentage=5.0, ... severity=TestSeverity.MEDIUM, ... ), ... ] >>> engine = TestEngine(tests) >>> summary = engine.run(results) >>> print(f"Pass rate: {summary.pass_rate}%")
- Parameters:
tests – List of test configurations.
- run(results: dict[str, list[BaseModel] | BaseModel]) TestSummary[source]#
Run all tests on the analysis results.
- Parameters:
results – Dictionary mapping analyzer names to results.
- Returns:
TestSummary containing all test results.
- class oumi.analyze.TestResult(*, test_id: str, passed: bool, severity: TestSeverity = TestSeverity.MEDIUM, title: str = '', description: str = '', metric: str = '', affected_count: int = 0, total_count: int = 0, affected_percentage: float = 0.0, threshold: float | None = None, actual_value: float | None = None, sample_indices: list[int] = <factory>, all_affected_indices: list[int] = <factory>, error: str | None = None, details: dict[str, ~typing.Any]=<factory>)[source]#
Bases:
BaseModelResult of a single test execution.
- Variables:
test_id (str) – Unique identifier for the test.
passed (bool) – Whether the test passed.
severity (oumi.core.configs.params.test_params.TestSeverity) – Severity level of the test.
title (str) – Human-readable title.
description (str) – Description of what the test checks.
metric (str) – The metric being tested (e.g., “analyzer_name.field”).
affected_count (int) – Number of samples that failed the test.
total_count (int) – Total number of samples tested.
affected_percentage (float) – Percentage of samples affected.
threshold (float | None) – The configured threshold for the test.
actual_value (float | None) – The actual computed value (for threshold tests).
sample_indices (list[int]) – Indices of affected samples (limited).
error (str | None) – Error message if test execution failed.
details (dict[str, Any]) – Additional details about the test result.
- actual_value: float | None#
- affected_count: int#
- affected_percentage: float#
- all_affected_indices: list[int]#
- description: str#
- details: dict[str, Any]#
- error: str | None#
- metric: str#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- passed: bool#
- sample_indices: list[int]#
- severity: TestSeverity#
- test_id: str#
- threshold: float | None#
- title: str#
- total_count: int#
- class oumi.analyze.TestSummary(*, results: list[TestResult] = <factory>, total_tests: int = 0, passed_tests: int = 0, failed_tests: int = 0, error_tests: int = 0, pass_rate: float = 0.0, high_severity_failures: int = 0, medium_severity_failures: int = 0, low_severity_failures: int = 0)[source]#
Bases:
BaseModelSummary of all test results.
- Variables:
results (list[oumi.analyze.testing.results.TestResult]) – List of individual test results.
total_tests (int) – Total number of tests run.
passed_tests (int) – Number of tests that passed.
failed_tests (int) – Number of tests that failed.
error_tests (int) – Number of tests that had errors.
pass_rate (float) – Percentage of tests that passed.
high_severity_failures (int) – Number of high severity failures.
medium_severity_failures (int) – Number of medium severity failures.
low_severity_failures (int) – Number of low severity failures.
- error_tests: int#
- failed_tests: int#
- classmethod from_results(results: list[TestResult]) TestSummary[source]#
Create a summary from a list of test results.
- Parameters:
results – List of test results.
- Returns:
TestSummary with computed statistics.
- get_error_results() list[TestResult][source]#
Get all test results with errors.
- get_failed_results() list[TestResult][source]#
Get all failed test results.
- get_passed_results() list[TestResult][source]#
Get all passed test results.
- high_severity_failures: int#
- low_severity_failures: int#
- medium_severity_failures: int#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- pass_rate: float#
- passed_tests: int#
- results: list[TestResult]#
- total_tests: int#
- class oumi.analyze.TurnStatsAnalyzer[source]#
Bases:
ConversationAnalyzer[TurnStatsMetrics]Analyzer for computing turn statistics of conversations.
Computes turn counts and per-role statistics to help understand conversation structure and balance.
Example
>>> from oumi.analyze.analyzers.turn_stats import TurnStatsAnalyzer >>> from oumi.core.types.conversation import Conversation, Message, Role >>> >>> analyzer = TurnStatsAnalyzer() >>> conversation = Conversation(messages=[ ... Message(role=Role.USER, content="What is Python?"), ... Message( ... role=Role.ASSISTANT, ... content="Python is a programming language.", ... ), ... ]) >>> result = analyzer.analyze(conversation) >>> print(f"Turns: {result.num_turns}") Turns: 2
- analyze(conversation: Conversation) TurnStatsMetrics[source]#
Analyze turn statistics for a conversation.
- Parameters:
conversation – The conversation to analyze.
- Returns:
TurnStatsMetrics containing turn counts and statistics.
- class oumi.analyze.TurnStatsMetrics(*, num_turns: int, num_user_turns: int, num_assistant_turns: int, num_tool_turns: int = 0, has_system_message: bool, first_turn_role: str | None = None, last_turn_role: str | None = None)[source]#
Bases:
BaseModelResult model for turn statistics analysis of conversations.
Example
>>> result = TurnStatsMetrics( ... num_turns=4, ... num_user_turns=2, ... num_assistant_turns=2, ... has_system_message=False, ... first_turn_role="user", ... last_turn_role="assistant", ... ) >>> print(result.num_turns) 4
- first_turn_role: str | None#
- has_system_message: bool#
- last_turn_role: str | None#
- model_config = {}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- num_assistant_turns: int#
- num_tool_turns: int#
- num_turns: int#
- num_user_turns: int#
- class oumi.analyze.TypedAnalyzeConfig(eval_name: str | None = None, parent_eval_id: str | None = None, dataset_name: str | None = None, dataset_path: str | None = None, split: str = 'train', subset: str | None = None, sample_count: int | None = None, output_path: str = '.', analyzers: list[AnalyzerConfig] = <factory>, custom_metrics: list[CustomMetricConfig] = <factory>, tests: list[TestParams] = <factory>, tokenizer_name: str | None = None, tokenizer_kwargs: dict[str, ~typing.Any]=<factory>, generate_report: bool = False, report_title: str | None = None)[source]#
Bases:
objectConfiguration for the typed analyzer pipeline.
This is the main configuration class for the new typed analyzer architecture. It supports both programmatic construction and loading from YAML files.
Example YAML:
dataset_path: /path/to/data.jsonl sample_count: 1000 output_path: ./analysis_output analyzers: - id: length params: count_tokens: true - id: quality custom_metrics: - id: turn_pattern scope: conversation function: | def compute(conversation): ... tests: - id: max_words type: threshold metric: LengthAnalyzer.total_words operator: ">" value: 10000 max_percentage: 5.0
- Variables:
dataset_name (str | None) – Name of the dataset (HuggingFace identifier).
dataset_path (str | None) – Path to local dataset file.
split (str) – Dataset split to use.
sample_count (int | None) – Number of samples to analyze.
output_path (str) – Directory for output artifacts.
analyzers (list[oumi.analyze.config.AnalyzerConfig]) – List of analyzer configurations.
custom_metrics (list[oumi.analyze.config.CustomMetricConfig]) – List of custom metric configurations.
tests (list[oumi.core.configs.params.test_params.TestParams]) – List of test configurations.
tokenizer_name (str | None) – Tokenizer for token counting.
generate_report (bool) – Whether to generate HTML report.
report_title (str | None) – Custom title for the report.
- analyzers: list[AnalyzerConfig]#
- custom_metrics: list[CustomMetricConfig]#
- dataset_name: str | None = None#
- dataset_path: str | None = None#
- eval_name: str | None = None#
- classmethod from_dict(data: dict[str, Any], allow_custom_code: bool = False) TypedAnalyzeConfig[source]#
Create configuration from a dictionary.
- Parameters:
data – Configuration dictionary.
allow_custom_code – If True, allow custom_metrics with function code. If False (default) and the config contains custom metrics with code, raises ValueError.
- Returns:
TypedAnalyzeConfig instance.
- Raises:
ValueError – If config contains custom code but allow_custom_code=False, or if duplicate analyzer instance_ids found.
- classmethod from_yaml(path: str | Path, allow_custom_code: bool = False) TypedAnalyzeConfig[source]#
Load configuration from a YAML file.
Warning
Security Warning: If the YAML file contains
custom_metricswithfunctionfields, arbitrary Python code will be loaded. Only load configurations from trusted sources. Setallow_custom_code=Trueto explicitly acknowledge this risk.- Parameters:
path – Path to YAML configuration file.
allow_custom_code – If True, allow loading custom_metrics with function code. If False (default) and the config contains custom metrics with code, raises ValueError.
- Returns:
TypedAnalyzeConfig instance.
- Raises:
ValueError – If config contains custom code but allow_custom_code=False.
- generate_report: bool = False#
- output_path: str = '.'#
- parent_eval_id: str | None = None#
- report_title: str | None = None#
- sample_count: int | None = None#
- split: str = 'train'#
- subset: str | None = None#
- tests: list[TestParams]#
- tokenizer_kwargs: dict[str, Any]#
- tokenizer_name: str | None = None#
- oumi.analyze.create_analyzer_from_config(analyzer_id: str, params: dict) MessageAnalyzer | ConversationAnalyzer | DatasetAnalyzer | None[source]#
Create an analyzer instance from configuration.
Prefers using the analyzer’s from_config() classmethod if available, otherwise falls back to direct instantiation with **params.
- Parameters:
analyzer_id – Analyzer type identifier.
params – Analyzer-specific parameters.
- Returns:
Analyzer instance or None if not found.
- oumi.analyze.describe_analyzer(analyzer_class: type) str[source]#
Get a human-readable description of an analyzer’s metrics.
- oumi.analyze.get_analyzer_class(name: str) type | None[source]#
Get an analyzer class by name.
- Parameters:
name – Name of the analyzer.
- Returns:
The analyzer class or None if not found.
- oumi.analyze.get_analyzer_info(analyzer_class: type) dict[str, Any][source]#
Get detailed information about an analyzer’s output metrics.
- oumi.analyze.get_instance_metrics(analyzer_class: type, config: dict[str, Any] | None = None) list[str][source]#
Get available metrics, attempting to instantiate with config for filtering.
- oumi.analyze.list_available_metrics(include_duplicates: bool = False) dict[str, dict[str, Any]][source]#
List all available metrics from registered analyzers.
- oumi.analyze.print_analyzer_metrics(analyzer_name: str | None = None) None[source]#
Pretty print available metrics for analyzers.
- Parameters:
analyzer_name – Optional specific analyzer to show. If None, shows all.
- oumi.analyze.register_analyzer(registry_name: str) Callable#
Returns function to register a sample analyzer in the Oumi global registry.
- Parameters:
registry_name – The name that the sample analyzer should be registered with.
- Returns:
Decorator function to register the target sample analyzer.
- oumi.analyze.to_analysis_dataframe(conversations: list[Conversation], results: Mapping[str, Sequence[BaseModel] | BaseModel], message_to_conversation_idx: list[int] | None = None) DataFrame[source]#
Convert typed analysis results to a pandas DataFrame.
Creates a DataFrame with one row per conversation, with columns for conversation metadata and all analyzer metrics. Analyzer field names are prefixed with the analyzer name to avoid collisions.
Example
>>> results = {"LengthAnalyzer": [LengthMetrics(...), LengthMetrics(...)]} >>> df = to_analysis_dataframe(conversations, results) >>> print(df.columns.tolist()) ['conversation_id', 'conversation_index', 'num_messages', 'length__total_chars', 'length__total_words', ...]
- Parameters:
conversations – List of conversations that were analyzed.
results – Dictionary mapping analyzer names to results. - For per-conversation results: list of BaseModel (len = num conversations) - For message-level results: list of BaseModel (len = num messages) - For dataset-level results: single BaseModel (will be repeated)
message_to_conversation_idx – Optional mapping from message index to conversation index. Required for proper aggregation of message-level results. If provided, message-level results will be aggregated per conversation.
- Returns:
DataFrame with conversation metadata and all metrics as columns.