ibis-ml
ibis-ml copied to clipboard
bug: cannot convert `y` to numpy on kaggle notebook in sklearn pipeline
In this competition, y
column cannot be converted to numpy array.
~~I could run this on my local machine, but not on kaggle notebook.~~
~~**I could reproduce this on my local.**~~
local env
Python version: 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 10:07:17) [Clang 14.0.6 ]
scikit-learn version: 1.5.1
skorch version: 1.0.0
torch version: 2.4.0
ibis-framework version: 9.3.0
kaggle env
Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
scikit-learn version: 1.2.2
skorch version: 1.0.0
torch version: 2.4.0+cpu
ibis-framework version: 9.3.0
# Wrap the PyTorch model with skorch
net = NeuralNetClassifier(
MyModel,
module__input_dim=635, # Specify the input dimension
max_epochs=1,
lr=0.001,
batch_size=32,
optimizer=optim.Adam,
criterion=nn.BCELoss,
iterator_train__shuffle=True,
callbacks=[
EarlyStopping(monitor='valid_loss', patience=25, load_best=True), # Early stopping
LRScheduler(policy='ReduceLROnPlateau', monitor='valid_loss', factor=0.1, patience=25, min_lr=1e-6)
],
verbose=1
)
# Define the sklearn pipeline with preprocessing and PyTorch model
pipeline = Pipeline([
('ibisml-prep', recipe), # Preprocessing step in IbisML
('model', net) # The PyTorch model wrapped as NeuralNetClassifier via skorch
])
pipeline.fit(X_train, y_train)
log
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[19], line 1
----> 1 pipeline.fit(X_train, y_train)
File /opt/conda/lib/python3.10/site-packages/sklearn/pipeline.py:405, in Pipeline.fit(self, X, y, **fit_params)
403 if self._final_estimator != "passthrough":
404 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
--> 405 self._final_estimator.fit(Xt, y, **fit_params_last_step)
407 return self
File /opt/conda/lib/python3.10/site-packages/skorch/classifier.py:165, in NeuralNetClassifier.fit(self, X, y, **fit_params)
154 """See ``NeuralNet.fit``.
155
156 In contrast to ``NeuralNet.fit``, ``y`` is non-optional to
(...)
160
161 """
162 # pylint: disable=useless-super-delegation
163 # this is actually a pylint bug:
164 # https://github.com/PyCQA/pylint/issues/1085
--> 165 return super(NeuralNetClassifier, self).fit(X, y, **fit_params)
File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1319, in NeuralNet.fit(self, X, y, **fit_params)
1316 if not self.warm_start or not self.initialized_:
1317 self.initialize()
-> 1319 self.partial_fit(X, y, **fit_params)
1320 return self
File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1278, in NeuralNet.partial_fit(self, X, y, classes, **fit_params)
1276 self.notify('on_train_begin', X=X, y=y)
1277 try:
-> 1278 self.fit_loop(X, y, **fit_params)
1279 except KeyboardInterrupt:
1280 pass
File /opt/conda/lib/python3.10/site-packages/skorch/net.py:1172, in NeuralNet.fit_loop(self, X, y, epochs, **fit_params)
1136 def fit_loop(self, X, y=None, epochs=None, **fit_params):
1137 """The proper fit loop.
1138
1139 Contains the logic of what actually happens during the fit
(...)
1170
1171 """
-> 1172 self.check_data(X, y)
1173 self.check_training_readiness()
1174 epochs = epochs if epochs is not None else self.max_epochs
File /opt/conda/lib/python3.10/site-packages/skorch/classifier.py:141, in NeuralNetClassifier.check_data(self, X, y)
137 pass
139 if y is not None:
140 # pylint: disable=attribute-defined-outside-init
--> 141 self.classes_inferred_ = np.unique(to_numpy(y))
File /opt/conda/lib/python3.10/site-packages/skorch/utils.py:152, in to_numpy(X)
149 return np.asarray(X)
151 if not is_torch_data_type(X):
--> 152 raise TypeError("Cannot convert this data type to a numpy array.")