imbalanced-learn icon indicating copy to clipboard operation
imbalanced-learn copied to clipboard

[BUG] fit_resample throws KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'

Open ntlghb opened this issue 3 years ago • 1 comments

Describe the bug

RandomOverSampler().fit_resample throws KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'

Steps/Code to Reproduce

X_res, y_res = sampling.fit_resample(X, y)

# where 'sampling' is an instance of RandomOverSampler() 
# with sampling_strategy = 1
# X and y are pandas dataframes.

Expected Results

Resampled arrays with target classes resampled in equal proportions

Actual Results

    X_res, y_res = sampling.fit_resample(X, y)
  File "/apps/brussel/CO7/skylake/software/imbalanced-learn/0.7.0-foss-2020b/lib/python3.8/site-packages/imblearn/base.py", line 88, in fit_resample
    X_, y_ = arrays_transformer.transform(output[0], y_)
  File "/apps/brussel/CO7/skylake/software/imbalanced-learn/0.7.0-foss-2020b/lib/python3.8/site-packages/imblearn/utils/_validation.py", line 40, in transform
    X = self._transfrom_one(X, self.x_props)
  File "/apps/brussel/CO7/skylake/software/imbalanced-learn/0.7.0-foss-2020b/lib/python3.8/site-packages/imblearn/utils/_validation.py", line 59, in _transfrom_one
    ret = ret.astype(props["dtypes"])
  File "/apps/brussel/CO7/skylake/software/SciPy-bundle/2020.11-foss-2020b/lib/python3.8/site-packages/pandas/core/generic.py", line 5531, in astype
    col.astype(dtype=dtype[col_name], copy=copy, errors=errors)
  File "/apps/brussel/CO7/skylake/software/SciPy-bundle/2020.11-foss-2020b/lib/python3.8/site-packages/pandas/core/generic.py", line 5514, in astype
    raise KeyError(
KeyError: 'Only the Series name can be used for the key in Series dtype mappings.'

Versions

Python 3.8.6 (default, Oct 19 2020, 16:14:34) 
[GCC 10.2.0]
NumPy 1.19.4
SciPy 1.5.4
Scikit-Learn 0.23.2
Imbalanced-Learn 0.7.0

ntlghb avatar Jun 01 '21 11:06 ntlghb

I couldn't reproduce this.

Here is what I tried:

from sklearn.datasets import make_classification
from imblearn.over_sampling import RandomOverSampler
import pandas as pd

X, y = make_classification()

X_pd, y_pd = pd.DataFrame(X), pd.DataFrame(y)

X_res, y_res = RandomOverSampler().fit_resample(X_pd, y_pd)
print(X_res.shape, y_res.shape)
# (100, 20) (100, 1)

Please follow-up with a minimal reproducible example. Upgrading scikit-learn and imbalanced-learn may resolve this as well.

hayesall avatar Jul 17 '22 03:07 hayesall

Closing then because we miss information.

glemaitre avatar Dec 03 '22 22:12 glemaitre