imbalanced-learn
imbalanced-learn copied to clipboard
[BUG] fit_resample throws KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'
Describe the bug
RandomOverSampler().fit_resample
throws KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'
Steps/Code to Reproduce
X_res, y_res = sampling.fit_resample(X, y)
# where 'sampling' is an instance of RandomOverSampler()
# with sampling_strategy = 1
# X and y are pandas dataframes.
Expected Results
Resampled arrays with target classes resampled in equal proportions
Actual Results
X_res, y_res = sampling.fit_resample(X, y)
File "/apps/brussel/CO7/skylake/software/imbalanced-learn/0.7.0-foss-2020b/lib/python3.8/site-packages/imblearn/base.py", line 88, in fit_resample
X_, y_ = arrays_transformer.transform(output[0], y_)
File "/apps/brussel/CO7/skylake/software/imbalanced-learn/0.7.0-foss-2020b/lib/python3.8/site-packages/imblearn/utils/_validation.py", line 40, in transform
X = self._transfrom_one(X, self.x_props)
File "/apps/brussel/CO7/skylake/software/imbalanced-learn/0.7.0-foss-2020b/lib/python3.8/site-packages/imblearn/utils/_validation.py", line 59, in _transfrom_one
ret = ret.astype(props["dtypes"])
File "/apps/brussel/CO7/skylake/software/SciPy-bundle/2020.11-foss-2020b/lib/python3.8/site-packages/pandas/core/generic.py", line 5531, in astype
col.astype(dtype=dtype[col_name], copy=copy, errors=errors)
File "/apps/brussel/CO7/skylake/software/SciPy-bundle/2020.11-foss-2020b/lib/python3.8/site-packages/pandas/core/generic.py", line 5514, in astype
raise KeyError(
KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'
Versions
Python 3.8.6 (default, Oct 19 2020, 16:14:34)
[GCC 10.2.0]
NumPy 1.19.4
SciPy 1.5.4
Scikit-Learn 0.23.2
Imbalanced-Learn 0.7.0
I couldn't reproduce this.
Here is what I tried:
from sklearn.datasets import make_classification
from imblearn.over_sampling import RandomOverSampler
import pandas as pd
X, y = make_classification()
X_pd, y_pd = pd.DataFrame(X), pd.DataFrame(y)
X_res, y_res = RandomOverSampler().fit_resample(X_pd, y_pd)
print(X_res.shape, y_res.shape)
# (100, 20) (100, 1)
Please follow-up with a minimal reproducible example. Upgrading scikit-learn
and imbalanced-learn
may resolve this as well.
Closing then because we miss information.