feature-selector icon indicating copy to clipboard operation
feature-selector copied to clipboard

ValueError:The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

Open NiDHanWang opened this issue 4 years ago • 3 comments

Hi there! Thanks so much for such good piece of work, it really helps!

But recently an error raised when I use identify_zero_importance. It worked well when I turn off the early_stopping , and error raised when I turn it on.

Here's my code: from feature_selector import FeatureSelector select_label=train_fill['SalePrice'] select_featrue=train_fill.drop(columns=['SalePrice','Id']) fs=FeatureSelector(data=select_featrue,labels=select_label) fs.identify_zero_importance(task='regression',eval_metric='L2',n_iterations=10,early_stopping=True)

Here's the error:

ValueError Traceback (most recent call last) in ----> 1 fs.identify_zero_importance(task='regression',eval_metric='L2',n_iterations=10,early_stopping=True)

D:\anaconda\lib\site-packages\feature_selector.py in identify_zero_importance(self, task, eval_metric, n_iterations, early_stopping) 304 if early_stopping: 305 --> 306 train_features, valid_features, train_labels, valid_labels = train_test_split(features, labels, test_size = 0.15, stratify=labels) 307 308 # Train the model with early stopping

D:\anaconda\lib\site-packages\sklearn\model_selection_split.py in train_test_split(*arrays, **options) 2119 random_state=random_state) 2120 -> 2121 train, test = next(cv.split(X=arrays[0], y=stratify)) 2122 2123 return list(chain.from_iterable((safe_indexing(a, train),

D:\anaconda\lib\site-packages\sklearn\model_selection_split.py in split(self, X, y, groups) 1321 """ 1322 X, y, groups = indexable(X, y, groups) -> 1323 for train, test in self._iter_indices(X, y, groups): 1324 yield train, test 1325

D:\anaconda\lib\site-packages\sklearn\model_selection_split.py in _iter_indices(self, X, y, groups) 1634 class_counts = np.bincount(y_indices) 1635 if np.min(class_counts) < 2: -> 1636 raise ValueError("The least populated class in y has only 1" 1637 " member, which is too few. The minimum" 1638 " number of groups for any class cannot"

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2

It seems like the something goes wrong whenn it try to split the data into train&test in line 306? How can i fix it?

NiDHanWang avatar Oct 19 '19 08:10 NiDHanWang

I solve this problem by removing the argument stratify in function train_test_split at the line 306.

DeckerDai avatar Oct 24 '19 15:10 DeckerDai

same issue here. Tried all alternatives in task= and eval_metric=... > always same error when early_stopping is set to True. Also tried to provide Y in different formats (array, pandas dataframe, pandas series) -> same error. @DeckerDai: removing argument stratify did not solve it for me. No error when early_stopping=False. Also not sure why error even comes up given that I'm trying to do a regression problem (task='regression',eval_metric='l2'

stabilus avatar Dec 31 '19 08:12 stabilus

I solve this problem by removing the argument stratify in function train_test_split at the line 306.

I explore that line and came up with this solution to keep the stratify argument for 'classification', but not for 'regression':

if early_stopping:
                if task == 'classification':
                    train_features, valid_features, train_labels, valid_labels = train_test_split(features, labels, test_size = 0.15, stratify=labels)
                
                else:
                    train_features, valid_features, train_labels, valid_labels = train_test_split(features, labels, test_size = 0.15)

rserran avatar Dec 26 '21 18:12 rserran