oumi.core.inference#

Inference module for the Oumi (Open Universal Machine Intelligence) library.

This module provides base classes for model inference in the Oumi framework.

class oumi.core.inference.BaseInferenceEngine(model_params: ModelParams, *, generation_params: GenerationParams | None = None)[source]#

Bases: ABC

Base class for running model inference.

apply_chat_template(conversation: Conversation, **tokenizer_kwargs) str[source]#

Applies the chat template to the conversation.

Parameters:
  • conversation – The conversation to apply the chat template to.

  • tokenizer_kwargs – Additional keyword arguments to pass to the tokenizer.

Returns:

The conversation with the chat template applied.

Return type:

str

get_batch_results_partial(batch_id: str, conversations: list[Conversation]) BatchResult[source]#

Gets partial results of a completed batch job.

Engines that support batch inference should override this method.

Parameters:
  • batch_id – The batch job ID.

  • conversations – Original conversations used to create the batch.

Returns:

BatchResult with successful conversations and failure details.

Raises:

NotImplementedError – If the engine does not support batch.

abstractmethod get_supported_params() set[str][source]#

Returns a set of supported generation parameters for this engine.

Override this method in derived classes to specify which parameters are supported.

Returns:

A set of supported parameter names.

Return type:

Set[str]

infer(input: list[Conversation] | None = None, inference_config: InferenceConfig | None = None) list[Conversation][source]#

Runs model inference.

Parameters:
  • input – A list of conversations to run inference on. Optional.

  • inference_config – Parameters for inference. If not specified, a default config is inferred.

Returns:

Inference output.

Return type:

List[Conversation]

class oumi.core.inference.BatchResult(successful: list[tuple[int, Conversation]], failed_indices: list[int], error_messages: dict[int, str])[source]#

Bases: object

Result of a partial batch retrieval, separating successes from failures.

error_messages: dict[int, str]#

Mapping of failed index to error message.

failed_indices: list[int]#

Indices of requests that failed.

property has_failures: bool#

Return True if any requests failed.

successful: list[tuple[int, Conversation]]#

List of (original_index, conversation) for successful requests.