imodels
imodels copied to clipboard
ExtraBasicDiscretizer not working with scikit-learn 1.4
First thank you for work. I appreciate it. :-)
I run the tutorial and there is a single example that does not work, the example that uses the ExtraBasicDiscretizer
:
disc = ExtraBasicDiscretizer(feat_names[:3], n_bins=3, strategy='uniform')
X_train_brl_df = disc.fit_transform(pd.DataFrame(X_train[:, :3], columns=feat_names[:3]))
X_test_brl_df = disc.transform(pd.DataFrame(X_test[:, :3], columns=feat_names[:3]))
The problem occurs in the second and third lines:
When calling X_train_brl_df = disc.fit_transform(pd.DataFrame(X_train[:, :3], columns=feat_names[:3]))
I get:
[/usr/lib64/python3.13/site-packages/sklearn/preprocessing/_discretization.py:248](http://localhost:8888/usr/lib64/python3.13/site-packages/sklearn/preprocessing/_discretization.py#line=247): FutureWarning: In version 1.5 onwards, subsample=200_000 will be used by default. Set subsample explicitly to silence this warning in the mean time. Set subsample=None to disable subsampling explicitly.
warnings.warn(
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[17], line 1
----> 1 X_train_brl_df = disc.fit_transform(pd.DataFrame(X_train[:, :3], columns=feat_names[:3]))
File [/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py:295](http://localhost:8888/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py#line=294), in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
293 @wraps(f)
294 def wrapped(self, X, *args, **kwargs):
--> 295 data_to_wrap = f(self, X, *args, **kwargs)
296 if isinstance(data_to_wrap, tuple):
297 # only wrap the first output for cross decomposition
298 return_tuple = (
299 _wrap_data_with_container(method, data_to_wrap[0], X, self),
300 *data_to_wrap[1:],
301 )
File [/usr/lib64/python3.13/site-packages/sklearn/base.py:1098](http://localhost:8888/usr/lib64/python3.13/site-packages/sklearn/base.py#line=1097), in TransformerMixin.fit_transform(self, X, y, **fit_params)
1083 warnings.warn(
1084 (
1085 f"This object ({self.__class__.__name__}) has a `transform`"
(...)
1093 UserWarning,
1094 )
1096 if y is None:
1097 # fit method of arity 1 (unsupervised transformation)
-> 1098 return self.fit(X, **fit_params).transform(X)
1099 else:
1100 # fit method of arity 2 (supervised transformation)
1101 return self.fit(X, y, **fit_params).transform(X)
File [/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py:295](http://localhost:8888/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py#line=294), in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
293 @wraps(f)
294 def wrapped(self, X, *args, **kwargs):
--> 295 data_to_wrap = f(self, X, *args, **kwargs)
296 if isinstance(data_to_wrap, tuple):
297 # only wrap the first output for cross decomposition
298 return_tuple = (
299 _wrap_data_with_container(method, data_to_wrap[0], X, self),
300 *data_to_wrap[1:],
301 )
File [/usr/lib/python3.13/site-packages/imodels/discretization/discretizer.py:391](http://localhost:8888/usr/lib/python3.13/site-packages/imodels/discretization/discretizer.py#line=390), in ExtraBasicDiscretizer.transform(self, X)
389 # One-hot encode the ordinal DF
390 disc_onehot_np = self.encoder_.transform(disc_ordinal_df_str)
--> 391 disc_onehot = pd.DataFrame(
392 disc_onehot_np, columns=self.encoder_.get_feature_names_out())
394 # Name columns after the interval they represent (e.g. 0.1_to_0.5)
395 for col, bin_edges in zip(self.dcols, self.discretizer_.bin_edges_):
File [/usr/lib64/python3.13/site-packages/pandas/core/frame.py:856](http://localhost:8888/usr/lib64/python3.13/site-packages/pandas/core/frame.py#line=855), in DataFrame.__init__(self, data, index, columns, dtype, copy)
848 mgr = arrays_to_mgr(
849 arrays,
850 columns,
(...)
853 typ=manager,
854 )
855 else:
--> 856 mgr = ndarray_to_mgr(
857 data,
858 index,
859 columns,
860 dtype=dtype,
861 copy=copy,
862 typ=manager,
863 )
864 else:
865 mgr = dict_to_mgr(
866 {},
867 index,
(...)
870 typ=manager,
871 )
File [/usr/lib64/python3.13/site-packages/pandas/core/internals/construction.py:336](http://localhost:8888/usr/lib64/python3.13/site-packages/pandas/core/internals/construction.py#line=335), in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
331 # _prep_ndarraylike ensures that values.ndim == 2 at this point
332 index, columns = _get_axes(
333 values.shape[0], values.shape[1], index=index, columns=columns
334 )
--> 336 _check_values_indices_shape_match(values, index, columns)
338 if typ == "array":
339 if issubclass(values.dtype.type, str):
File [/usr/lib64/python3.13/site-packages/pandas/core/internals/construction.py:420](http://localhost:8888/usr/lib64/python3.13/site-packages/pandas/core/internals/construction.py#line=419), in _check_values_indices_shape_match(values, index, columns)
418 passed = values.shape
419 implied = (len(index), len(columns))
--> 420 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (192, 1), indices imply (192, 9)
If I run the third line before the second, X_test_brl_df = disc.transform(pd.DataFrame(X_test[:, :3], columns=feat_names[:3]))
, I get:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[16], line 2
1 disc = ExtraBasicDiscretizer(feat_names[:3], n_bins=3, strategy='uniform')
----> 2 X_test_brl_df = disc.transform(pd.DataFrame(X_test[:, :3], columns=feat_names[:3]))
File [/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py:295](http://localhost:8888/usr/lib64/python3.13/site-packages/sklearn/utils/_set_output.py#line=294), in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
293 @wraps(f)
294 def wrapped(self, X, *args, **kwargs):
--> 295 data_to_wrap = f(self, X, *args, **kwargs)
296 if isinstance(data_to_wrap, tuple):
297 # only wrap the first output for cross decomposition
298 return_tuple = (
299 _wrap_data_with_container(method, data_to_wrap[0], X, self),
300 *data_to_wrap[1:],
301 )
File [/usr/lib/python3.13/site-packages/imodels/discretization/discretizer.py:385](http://localhost:8888/usr/lib/python3.13/site-packages/imodels/discretization/discretizer.py#line=384), in ExtraBasicDiscretizer.transform(self, X)
369 """
370 Discretize the data.
371
(...)
381 binned space. All other features remain unchanged.
382 """
384 # Apply discretizer transform to get ordinally coded DF
--> 385 disc_ordinal_np = self.discretizer_.transform(X[self.dcols])
386 disc_ordinal_df = pd.DataFrame(disc_ordinal_np, columns=self.dcols)
387 disc_ordinal_df_str = disc_ordinal_df.astype(int).astype(str)
AttributeError: 'ExtraBasicDiscretizer' object has no attribute 'discretizer_'
OK, on hindsight I understand why this fails, because we have not trained (no fit before). Running after the second line the error is similar to the one that we get in the second line.