Model Predict

Uses a trained model (sklearn, xgboost, catboost, lightgbm) to make predictions on new data.

Model Predict

Processing

This brick applies a pre-trained machine learning model to new data to generate predictions. It acts as the "inference" step in a machine learning workflow, where the model uses what it learned during training to analyze new information.

The brick is designed to be model-agnostic. It automatically detects if the provided model is a Classifier (predicting categories, like "Spam" vs. "Not Spam") or a Regressor (predicting continuous numbers, like house prices).

For Regressors: It outputs the predicted value.
For Classifiers: It outputs the predicted class label and the probability score for every possible class (e.g., 80% confident it is "Spam").

Inputs

Model: The trained machine learning model object. This is usually the output from a "Train Model" brick or a loaded model file. It represents the "brain" that contains the logic for making predictions.
X: The new data you want to analyze. This should contain the same features (columns/variables) that were used to train the model, excluding the target variable.

Inputs Types

Input	Types
`Model`	`Any`
`X`	`DataFrame`, `ArrowTable`, `NDArray`

You can check the list of supported types here: Available Type Hints.

Outputs

predictions: A structured table containing the results of the model's analysis. This includes the final prediction and, for classification models, the confidence scores for each category.

The predictions output contains the following specific data fields:

prediction: The main result. For regression, this is the calculated value. For classification, this is the predicted category/label.
proba_{class_name}: (Classification only) The probability score for a specific class. There will be one column for each class known by the model (e.g., proba_Yes, proba_No, or proba_0, proba_1).

Outputs Types

Output	Types
`predictions`	`DataFrame`

You can check the list of supported types here: Available Type Hints.

Options

The Model Predict brick contains some changeable options:

Verbose: Controls the amount of information logged to the console during execution.

import logging
import re
import numpy as np
import pandas as pd
import polars as pl
import pyarrow as pa
from coded_flows.types import Any, Union, DataFrame, ArrowTable, NDArray
from coded_flows.utils import CodedFlowsLogger

logger = CodedFlowsLogger(name="Model Predict", level=logging.INFO)


def _get_base_name(model: Any) -> str:
    model_type = type(model)
    class_name = model_type.__name__
    name = re.sub("(.)([A-Z][a-z]+)", "\\1 \\2", class_name)
    return re.sub("([a-z0-9])([A-Z])", "\\1 \\2", name)


def model_predict(
    Model: Any, X: Union[DataFrame, ArrowTable, NDArray], options: dict = None
) -> DataFrame:
    options = options or {}
    verbose = options.get("verbose", True)
    predictions = None
    if Model is None:
        verbose and logger.error("Model object is not provided.")
        raise ValueError("A trained model must be provided.")
    if X is None:
        verbose and logger.error("Input data (X) is not provided.")
        raise ValueError("Input data (X) must be provided.")
    try:
        verbose and logger.info("Starting prediction process.")
        input_index = None
        if isinstance(X, pd.DataFrame):
            input_index = X.index
            verbose and logger.info("Input data is already a pandas DataFrame.")
        elif isinstance(X, pl.DataFrame):
            X = X.to_pandas()
            input_index = X.index
            verbose and logger.info("Converted Polars DataFrame to pandas DataFrame.")
        elif isinstance(X, pa.Table):
            X = X.to_pandas()
            input_index = X.index
            verbose and logger.info("Converted PyArrow Table to pandas DataFrame.")
        elif isinstance(X, np.ndarray):
            verbose and logger.info("Input data is a Numpy Array.")
        else:
            try:
                X = pd.DataFrame(X)
                input_index = X.index
                verbose and logger.info(
                    "Converted generic input data to pandas DataFrame."
                )
            except Exception as conv_error:
                verbose and logger.error(f"Unable to convert input data: {conv_error}")
                raise TypeError(
                    f"Unsupported input data type: {type(X)}. Expected pandas DataFrame, Polars DataFrame, PyArrow Table, or Numpy Array."
                )
        shape_info = X.shape
        verbose and logger.info(f"Data ready for prediction. Shape: {shape_info}")
        is_classifier = hasattr(Model, "predict_proba")
        model_type_str = _get_base_name(Model)
        if is_classifier:
            verbose and logger.info(
                f"Classification model detected ({model_type_str}). Predicting classes and probabilities."
            )
            predictions = Model.predict(X)
            probabilities = Model.predict_proba(X)
            if probabilities.ndim == 1:
                probabilities = np.vstack([1 - probabilities, probabilities]).T
            if hasattr(Model, "classes_"):
                proba_columns = [f"proba_{cls}" for cls in Model.classes_]
            else:
                proba_columns = [f"proba_{i}" for i in range(probabilities.shape[1])]
            if len(proba_columns) != probabilities.shape[1]:
                proba_columns = [f"proba_{i}" for i in range(probabilities.shape[1])]
            proba_df = pd.DataFrame(
                probabilities, columns=proba_columns, index=input_index
            )
            predictions = pd.DataFrame(
                predictions, columns=["prediction"], index=input_index
            )
            predictions = pd.concat([predictions, proba_df], axis=1)
            verbose and logger.info(
                f"Generated predictions and probabilities for {len(proba_columns)} classes."
            )
        else:
            verbose and logger.info(
                f"Regression model detected ({model_type_str}). Generating predictions."
            )
            predictions = Model.predict(X)
            predictions = pd.DataFrame(
                predictions, columns=["prediction"], index=input_index
            )
        verbose and logger.info(
            f"Prediction complete. Output shape: {predictions.shape}"
        )
    except Exception as e:
        verbose and logger.error(f"An error occurred during prediction: {e}")
        raise
    return predictions

Brick Info

version v0.1.4

python 3.11, 3.12, 3.13

requirements

shap>=0.47.0
scikit-learn
numpy
pandas
pyarrow
polars[pyarrow]
lightgbm
numba>=0.56.0
xgboost