ROC Curve

Computes the Receiver Operating Characteristic (ROC) curve. Returns a DataFrame with 'fpr', 'tpr', and 'thresholds'.

ROC Curve

Processing

This brick evaluates the performance of a classification model by calculating the Receiver Operating Characteristic (ROC) curve. It compares the actual "true" labels against the predicted scores to generate three key metrics: False Positive Rate (FPR), True Positive Rate (TPR), and decision Thresholds.

The resulting data allows you to visualize how well your model distinguishes between classes and helps you select the optimal probability threshold for your specific use case.

Inputs

data
The dataset containing your model's predictions and the actual ground truth labels.

Inputs Types

Input Types
data DataFrame, ArrowTable

You can check the list of supported types here: Available Type Hints.

Outputs

ROC Table
A summary table containing the calculated curve points. Each row corresponds to a specific decision threshold and its resulting performance metrics.

The ROC Table output contains the following specific data fields:

  • fpr: The False Positive Rate (the portion of negatives incorrectly classified as positive).
  • tpr: The True Positive Rate (the portion of positives correctly classified as positive, also known as Recall).
  • thresholds: The specific probability score used as the cut-off to determine the associated FPR and TPR.

Outputs Types

Output Types
ROC Table DataFrame

You can check the list of supported types here: Available Type Hints.

Options

The ROC Curve brick contains some changeable options:

True Label Column
The name of the column in your input data that contains the actual class labels (the "Ground Truth"). Common names include "target", "actual", or "y_true".
Score Column
The name of the column that contains the probability or confidence scores output by your model. Common names include "score", "probability", or "y_score".
Positive Class Label
The specific label that represents the "positive" class (the class you are trying to predict, e.g., "Fraud", "Spam", or "Win"). If left empty, the brick defaults to using 1 as the positive label.
Drop Intermediate
An optimization setting that reduces the size of the output table. Drops suboptimal thresholds that do not appear on the ROC curve, resulting in a lighter, easier-to-plot dataset.
Verbose
Controls whether the brick writes detailed logs during processing. Useful for debugging if the calculation fails.
import logging
import pandas as pd
import polars as pl
import pyarrow as pa
from sklearn.metrics import roc_curve
from coded_flows.types import Union, DataFrame, ArrowTable, Tuple, DataSeries, Str
from coded_flows.utils import CodedFlowsLogger

logger = CodedFlowsLogger(name="ROC Curve", level=logging.INFO)


def compute_roc_curve(data: Union[DataFrame, ArrowTable], options=None) -> DataFrame:
    options = options or {}
    verbose = options.get("verbose", True)
    target_col = options.get("target_column", "target")
    score_col = options.get("score_column", "score")
    pos_label_input = str(options.get("pos_label", "")).strip()
    drop_intermediate = options.get("drop_intermediate", False)
    ROC_Table = None
    try:
        verbose and logger.info(
            f"Starting ROC computation. Target: '{target_col}', Score: '{score_col}'"
        )
        df_pandas = None
        input_type = "unknown"
        if isinstance(data, pd.DataFrame):
            df_pandas = data
            input_type = "pandas"
        elif isinstance(data, pl.DataFrame):
            df_pandas = data.to_pandas()
            input_type = "polars"
        elif isinstance(data, (pa.Table, pa.lib.Table)):
            df_pandas = data.to_pandas()
            input_type = "arrow"
        else:
            raise ValueError(
                "Input data must be a pandas DataFrame, Polars DataFrame, or Arrow Table"
            )
        verbose and logger.info(f"Converted input from {input_type} to pandas.")
        if target_col not in df_pandas.columns:
            raise ValueError(f"Target column '{target_col}' not found in dataset.")
        if score_col not in df_pandas.columns:
            raise ValueError(f"Score column '{score_col}' not found in dataset.")
        y_true = df_pandas[target_col]
        y_score = df_pandas[score_col]
        pos_label = None
        if pos_label_input:
            col_dtype = y_true.dtype
            verbose and logger.info(f"Target column dtype detected: {col_dtype}")
            if pd.api.types.is_bool_dtype(col_dtype):
                if pos_label_input.lower() == "true":
                    pos_label = True
                elif pos_label_input.lower() == "false":
                    pos_label = False
                else:
                    pos_label = bool(pos_label_input)
                verbose and logger.info(f"Casted pos_label to BOOLEAN: {pos_label}")
            elif pd.api.types.is_numeric_dtype(col_dtype):
                try:
                    if float(pos_label_input).is_integer():
                        pos_label = int(float(pos_label_input))
                    else:
                        pos_label = float(pos_label_input)
                    verbose and logger.info(f"Casted pos_label to NUMERIC: {pos_label}")
                except ValueError:
                    pos_label = pos_label_input
                    verbose and logger.warning(
                        f"Target is numeric but pos_label '{pos_label_input}' could not be cast. Using string."
                    )
            else:
                pos_label = pos_label_input
                verbose and logger.info(
                    f"Target is categorical/string. Using STRING pos_label: '{pos_label}'"
                )
        else:
            verbose and logger.info(
                "No positive label provided. Brick will infer the positive class."
            )
        (fpr, tpr, thresholds) = roc_curve(
            y_true, y_score, pos_label=pos_label, drop_intermediate=drop_intermediate
        )
        ROC_Table = pd.DataFrame({"fpr": fpr, "tpr": tpr, "thresholds": thresholds})
        verbose and logger.info(
            f"Computation successful. Result shape: {ROC_Table.shape}"
        )
    except Exception as e:
        verbose and logger.error(f"Error during ROC computation: {e}")
        raise
    return ROC_Table

Brick Info

version v0.1.4
python 3.11, 3.12, 3.13
requirements
  • shap>=0.47.0
  • scikit-learn
  • pandas
  • pyarrow
  • polars[pyarrow]
  • numba>=0.56.0