imbalanced-learn
imbalanced-learn copied to clipboard
[BUG] ValueError: Found array with 0 sample(s)
Describe the bug
When using SVMSMOTE on dataset which contains a minority class which has very few samples (may be < 10), it'll raise error ValueError: Found array with 0 sample(s) (shape=(0, 600)) while a minimum of 1 is required.
Steps/Code to Reproduce
from collections import Counter
from sklearn.datasets import make_classification
from imblearn.over_sampling import SVMSMOTE # doctest: +NORMALIZE_WHITESPACE
X, y = make_classification(n_classes=3, class_sep=0,
weights=[0.004, 0.451, 0.545], n_informative=3, n_redundant=0, flip_y=0,
n_features=3, n_clusters_per_class=2, n_samples=1000, random_state=10)
print('Original dataset shape %s' % Counter(y))
sm = SVMSMOTE(random_state=42, k_neighbors=4)
X_res, y_res = sm.fit_resample(X, y)
print('Resampled dataset shape %s' % Counter(y_res))
Expected Results
Running without error
Actual Results
Original dataset shape Counter({2: 544, 1: 451, 0: 5})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-78-8f5d2308c2bd> in <module>()
10
11 sm = SVMSMOTE(random_state=42, k_neighbors=4)
---> 12 X_res, y_res = sm.fit_resample(X, y)
13 print('Resampled dataset shape %s' % Counter(y_res))
~/anaconda3/lib/python3.6/site-packages/imblearn/base.py in fit_resample(self, X, y)
82 self.sampling_strategy, y, self._sampling_type)
83
---> 84 output = self._fit_resample(X, y)
85
86 if binarize_y:
~/anaconda3/lib/python3.6/site-packages/imblearn/over_sampling/_smote.py in _fit_resample(self, X, y)
530 def _fit_resample(self, X, y):
531 # print("_fit_resample X shape", X.shape)
--> 532 return self._sample(X, y)
533
534 def _sample(self, X, y):
~/anaconda3/lib/python3.6/site-packages/imblearn/over_sampling/_smote.py in _sample(self, X, y)
569
570 danger_bool = self._in_danger_noise(
--> 571 self.nn_m_, support_vector, class_sample, y, kind='danger')
572 safety_bool = np.logical_not(danger_bool)
573
~/anaconda3/lib/python3.6/site-packages/imblearn/over_sampling/_smote.py in _in_danger_noise(self, nn_estimator, samples, target_class, y, kind)
213 # print("kind", kind)
214 # print("_in_danger_noise samples shape", samples.shape)
--> 215 x = nn_estimator.kneighbors(samples, return_distance=False)[:, 1:]
216 # print("x", x)
217 nn_label = (y[x] != target_class).astype(int)
~/anaconda3/lib/python3.6/site-packages/sklearn/neighbors/base.py in kneighbors(self, X, n_neighbors, return_distance)
400 if X is not None:
401 query_is_train = False
--> 402 X = check_array(X, accept_sparse='csr')
403 else:
404 query_is_train = True
~/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
548 " minimum of %d is required%s."
549 % (n_samples, array.shape, ensure_min_samples,
--> 550 context))
551
552 if ensure_min_features > 0 and array.ndim == 2:
ValueError: Found array with 0 sample(s) (shape=(0, 3)) while a minimum of 1 is required.
Versions
System: python: 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0] executable: /home/allenyl/anaconda3/bin/python machine: Linux-4.15.0-112-generic-x86_64-with-debian-buster-sid
Python deps: pip: 19.2.2 setuptools: 41.0.1 sklearn: 0.21.3 numpy: 1.15.1 scipy: 1.4.1 Cython: 0.28.2 pandas: 0.24.1
Did you find a fix for this ? Having the same issue here
@hiyamgh I've pushed a fix, but as @glemaitre's commented on #743, I need to add something before it can be merged. But currently I have no time to do it....
Thank you @allenyllee for notifying me, from my side the error turned out to be that I was using SMOTENC, and in there, I was passing an empty list for the categorical_features parameter (did not know that the dataset must have a mix of numerical and categorical).
Thank you @allenyllee for notifying me, from my side the error turned out to be that I was using
SMOTENC, and in there, I was passing an empty list for thecategorical_featuresparameter (did not know that the dataset must have a mix of numerical and categorical).
Hi @hiyamgh, I am having the same issue. Did you fix the problem? I am very new to the field. I can hardy follow #743
Hi All! I have found this thread searching for a solution for identical problem. I have found that generally SMOTE-based algos might have a problem with oversampling extremely scarce class. ADASYN solved my problem.
Is this fixed? I am having the same issue
This is present in: Python3.9.9 imbalanced-learn 0.9.0
Regarding the original use example, class_sep is really meaning that all data points are mixed. Therefore, the support vectors are categorized as noise. In this case, there is another solution than using another variant. In real-life, there actually no point to do machine learning in this case because the underlying classification predictor will be useless.