model_compare.model_cv_metric_compare

model_compare.model_cv_metric_compare(models_dict, X, y, cv=5)

Evaluates multiple binary classification models using cross-validation and returns a metric/scorer comparison DataFrame.

Parameters

Name Type Description Default
models_dict dict Dictionary of {model_name: pipeline_object}. Note: Models do not need to be fitted beforehand. required
X DataFrame Features (Training set or full dataset). required
y Series Labels (Training set or full dataset). required
cv int Number of cross-validation folds (default 5). 5

Scorers Evaluated

  • accuracy
  • precision (pos_label=“Y”)
  • recall (pos_label=“Y”)
  • f1 (pos_label=“Y”)
  • roc_auc (if model supports predict_proba)

Returns

Name Type Description
dataframe pandas.DataFrame Dataframe containing model name and mean evaluation metrics.

Examples

>>> from sklearn.svm import SVC
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.datasets import load_iris
>>> import pandas as pd
>>> 
>>> X, y = load_iris(return_X_y=True)
>>> X = pd.DataFrame(X)
>>> y = pd.Series(y)
>>> 
>>> models = {
...     'SVC': SVC(),
...     'RandomForest': RandomForestClassifier(random_state=42)
... }
>>> 
>>> results = model_cv_metric_compare(models, X, y, cv=5)
>>> print(results)
                  accuracy  precision  recall      f1  roc_auc
Model                                                          
SVC                  0.98       0.98    0.98    0.98     0.99
RandomForest         0.96       0.96    0.96    0.96     0.98

Notes

  • All models must implement scikit-learn’s estimator interface
  • For ROC-AUC, models must support the predict_proba() method
  • Uses stratified K-fold cross-validation