[BUG] Cannot clone object DataTypes_Auto_infer as the constructor either does not set or modifies parameter categorical_features
Describe the bug I am trying to stack 7 pretrained models (imported pkl files with pipeline) but I am facing below error when trying to invoke stack_models():
RuntimeError: Cannot clone object DataTypes_Auto_infer(categorical_features=[], display_types=False,
features_todrop=[], id_columns=[],
ml_usecase='classification', numerical_features=[],
target='claim', time_features=[]), as the constructor either does not set or modifies parameter categorical_features
The pretrained models are trained, tuned, and calibrated using pycaret if it makes any difference. Data set has zero categorical values.
To Reproduce
print('Data Setup ...')
setup(
data=train_,
target=target_,
ignore_low_variance = True,
silent= True,
use_gpu = False,
normalize = True,
numeric_imputation = 'mean',
session_id = 202109, log_experiment = False)
# Below is loaded successfully
print('Loading Models ...')
cat0 = load_model('../input/trained-models-0/finilized_model_catboost0')
cat1 = load_model('../input/trained-models-0/finilized_model_catboost1')
lgbm0 = load_model('../input/trained-models-0/finilized_model_lightgbm0')
lgbm1 = load_model('../input/trained-models-0/finilized_model_lightgbm1')
lgbm2 = load_model('../input/trained-models-0/finilized_model_lightgbm2')
lgbm3 = load_model('../input/trained-models-0/finilized_model_lightgbm3')
lgbm4 = load_model('../input/trained-models-0/finilized_model_lightgbm4')
models_list=[cat0,cat1,lgbm0,lgbm1,lgbm2,lgbm3,lgbm4]
# Error is triggered when below is invoked
stacker = stack_models(
estimator_list = models_list,
fold=5,
restack=True,
choose_better=True,
optimize="AUC"
)
Expected behavior Stacker should run without errors
Additional context The error seems to be coming from /opt/conda/lib/python3.7/site-packages/sklearn/base.py . Below is the last stack trace.
/opt/conda/lib/python3.7/site-packages/sklearn/base.py in clone(estimator, safe)
96 raise RuntimeError('Cannot clone object %s, as the constructor '
97 'either does not set or modifies parameter %s' %
---> 98 (estimator, name))
99 return new_object
100
Versions
'2.3.3'
Update: The issue seems to happen if model is saved with default parameters. Good News: If I saved the model with parameter "model_only=True" I am able to import it and blend/stack it fine. Bad News: I have to retrain all the models again (I found a very influential feature during another round of EDA so I will have to retrain again anyway).
I will leave the issue open since it is still triggered if model is saved with default parameters (model + pipeline).
If you save the model using model_only=False (which is default), you also save a bunch of metadata along with the model. Hence when you load the model you get back the model along with metadata. Incase you only need the model you can do
pipeline = load_model('pickle_file.pkl')
model = pipeline.named_steps['trained_model']
or save the model with model_only=True in which case you lose the metadata.
The example above then become
# Below is loaded successfully
print('Loading Models ...')
cat0 = load_model('../input/trained-models-0/finilized_model_catboost0')
cat1 = load_model('../input/trained-models-0/finilized_model_catboost1')
models_list=[cat0.named_steps['trained_model'], cat1.named_steps['trained_model']]
# Error is triggered when below is invoked
stacker = stack_models(
estimator_list = models_list,
fold=5,
restack=True,
choose_better=True,
optimize="AUC"
)
@srikarplus, thank for the workaround!
The problem is that I intentionally set model_only=False because I wanted to reproduce exactly the whole pipeline.
It seems that it is actually a bug because the pipeline, as I understand, should be fittable (pipeline.fit(...)) right after loading. Thus, no metadata should break the pipeline execution. Isn't that right?