pycaret icon indicating copy to clipboard operation
pycaret copied to clipboard

[BUG] Cannot clone object DataTypes_Auto_infer as the constructor either does not set or modifies parameter categorical_features

Open irkan-hadi opened this issue 4 years ago • 2 comments

Describe the bug I am trying to stack 7 pretrained models (imported pkl files with pipeline) but I am facing below error when trying to invoke stack_models():

RuntimeError: Cannot clone object DataTypes_Auto_infer(categorical_features=[], display_types=False,
                     features_todrop=[], id_columns=[],
                     ml_usecase='classification', numerical_features=[],
                     target='claim', time_features=[]), as the constructor either does not set or modifies parameter categorical_features

The pretrained models are trained, tuned, and calibrated using pycaret if it makes any difference. Data set has zero categorical values.

To Reproduce


print('Data Setup ...')
setup(
    data=train_,
    target=target_,
    ignore_low_variance = True,
    silent= True,
    use_gpu = False,
    normalize = True,
    numeric_imputation = 'mean',
    session_id = 202109, log_experiment = False)

# Below is loaded successfully 
print('Loading Models ...')
cat0 = load_model('../input/trained-models-0/finilized_model_catboost0')
cat1 = load_model('../input/trained-models-0/finilized_model_catboost1')
lgbm0 = load_model('../input/trained-models-0/finilized_model_lightgbm0')
lgbm1 = load_model('../input/trained-models-0/finilized_model_lightgbm1')
lgbm2 = load_model('../input/trained-models-0/finilized_model_lightgbm2')
lgbm3 = load_model('../input/trained-models-0/finilized_model_lightgbm3')
lgbm4 = load_model('../input/trained-models-0/finilized_model_lightgbm4')

models_list=[cat0,cat1,lgbm0,lgbm1,lgbm2,lgbm3,lgbm4]

# Error is triggered when below is invoked
stacker = stack_models(
    estimator_list = models_list,
    fold=5,
    restack=True,
    choose_better=True,
    optimize="AUC"
)

Expected behavior Stacker should run without errors

Additional context The error seems to be coming from /opt/conda/lib/python3.7/site-packages/sklearn/base.py . Below is the last stack trace.

/opt/conda/lib/python3.7/site-packages/sklearn/base.py in clone(estimator, safe)
     96             raise RuntimeError('Cannot clone object %s, as the constructor '
     97                                'either does not set or modifies parameter %s' %
---> 98                                (estimator, name))
     99     return new_object
    100 

Versions

'2.3.3'

Update: The issue seems to happen if model is saved with default parameters. Good News: If I saved the model with parameter "model_only=True" I am able to import it and blend/stack it fine. Bad News: I have to retrain all the models again (I found a very influential feature during another round of EDA so I will have to retrain again anyway).

I will leave the issue open since it is still triggered if model is saved with default parameters (model + pipeline).

irkan-hadi avatar Sep 04 '21 10:09 irkan-hadi

If you save the model using model_only=False (which is default), you also save a bunch of metadata along with the model. Hence when you load the model you get back the model along with metadata. Incase you only need the model you can do

pipeline = load_model('pickle_file.pkl')
model = pipeline.named_steps['trained_model']

or save the model with model_only=True in which case you lose the metadata.

The example above then become

# Below is loaded successfully 
print('Loading Models ...')
cat0 = load_model('../input/trained-models-0/finilized_model_catboost0')
cat1 = load_model('../input/trained-models-0/finilized_model_catboost1')

models_list=[cat0.named_steps['trained_model'], cat1.named_steps['trained_model']]

# Error is triggered when below is invoked
stacker = stack_models(
    estimator_list = models_list,
    fold=5,
    restack=True,
    choose_better=True,
    optimize="AUC"
)

srikarplus avatar Mar 04 '22 18:03 srikarplus

@srikarplus, thank for the workaround!

The problem is that I intentionally set model_only=False because I wanted to reproduce exactly the whole pipeline.

It seems that it is actually a bug because the pipeline, as I understand, should be fittable (pipeline.fit(...)) right after loading. Thus, no metadata should break the pipeline execution. Isn't that right?

pedropalb avatar Jun 21 '22 19:06 pedropalb