cobra icon indicating copy to clipboard operation
cobra copied to clipboard

serialization-deserialization bug

Open patrickleonardy opened this issue 2 years ago • 4 comments

Bug Report

After serializing and de-serializing a PreProcessor with only contiguous variables (to check if it is also the case when categorical variables are present)

  1. the preprocessor object can not be printed -> AttributeError
  2. when trying to transform data the KBinsDiscretizer throws -> NotFittedError

Description

For the first point: It seems that the problem with the difference in the naming of the attributes and the parameters in the function definition. self._get_param_names() returns "categorical_data_processor" but getattr() only knows "_categorical_data_processor". By changing the naming this problem is resolved is there no other way ?

For the second point: There is a problem when creating the pipeline_dictionary it seems that some keywords are empty even if they should have a value.

Steps to Reproduce

  1. Load a dataset: from sklearn.datasets import load_iris import pandas as pd X, y = load_iris(return_X_y=True, as_frame=True) df = pd.concat([X,y]) df = df.rename({0:"target"}, axis=1)
  2. Create preprocessor and fit it from cobra.preprocessing import PreProcessor preprocessor = PreProcessor.from_params() continuous_vars = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] discrete_vars = [] preprocessor.fit( df, continuous_vars= continuous_vars, discrete_vars= discrete_vars, target_column_name="target" )
  3. Serialize the preprocessor pipeline_serialized = preprocessor.serialize_pipeline()
  4. De-serialize new_preprocessor = PreProcessor.from_pipeline(pipeline_serialized)
  5. See what happens when printing new_preprocessor
  6. See what happens when transforming new_preprocessor.transform( df, continuous_vars= continuous_vars, discrete_vars= [] )

Actual Results

I got ...

MicrosoftTeams-image MicrosoftTeams-image (1)

patrickleonardy avatar Jan 11 '23 10:01 patrickleonardy