cobra
cobra copied to clipboard
serialization-deserialization bug
Bug Report
After serializing and de-serializing a PreProcessor with only contiguous variables (to check if it is also the case when categorical variables are present)
- the preprocessor object can not be printed -> AttributeError
- when trying to transform data the KBinsDiscretizer throws -> NotFittedError
Description
For the first point: It seems that the problem with the difference in the naming of the attributes and the parameters in the function definition. self._get_param_names() returns "categorical_data_processor" but getattr() only knows "_categorical_data_processor". By changing the naming this problem is resolved is there no other way ?
For the second point: There is a problem when creating the pipeline_dictionary it seems that some keywords are empty even if they should have a value.
Steps to Reproduce
- Load a dataset:
from sklearn.datasets import load_iris
import pandas as pd
X, y = load_iris(return_X_y=True, as_frame=True)
df = pd.concat([X,y])
df = df.rename({0:"target"}, axis=1)
- Create preprocessor and fit it
from cobra.preprocessing import PreProcessor
preprocessor = PreProcessor.from_params()
continuous_vars = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
discrete_vars = []
preprocessor.fit( df, continuous_vars= continuous_vars, discrete_vars= discrete_vars, target_column_name="target" )
- Serialize the preprocessor
pipeline_serialized = preprocessor.serialize_pipeline()
- De-serialize
new_preprocessor = PreProcessor.from_pipeline(pipeline_serialized)
- See what happens when printing
new_preprocessor
- See what happens when transforming
new_preprocessor.transform( df, continuous_vars= continuous_vars, discrete_vars= [] )
Actual Results
I got ...