smote_variants
smote_variants copied to clipboard
when use SOMO,Why did the two types of samples not reach a balance and the number did not change
There can be multiple reasons for that. In many cases the authors of a particular SMOTE variant did not cover all the possible corner cases, for example,
- all minority samples are treated as noise according to the noise definition of the technique,
- the method wants to work with, say, 5 nearest neighbors, but there are only 3 minority samples,
- mathematical techniques like self-organizing maps, do not converge,
- etc.,
all of these because of the nature of the data is not compatible with the parameter settings and presumptions of the SMOTE variant.
Where I found reasonable resolutions, I implemented them, in those cases when it is unfeasible (for example, determining the 5 closest neighbors when you have only 3 samples in a class), the data is returned unaltered, although I would expect some message in the logs if logging is enabled.
Most likely your data is a corner case of the SOMO implementation with the parameters you used. Adjusting the parameters might lead to a properly operating SOMO.
Also, if you share a minimal working example, I can look into it.
thanks for your reply, i wrote a code like this:
pip install -U imbalanced-learn pip install smote-variants import numpy as np import smote_variants as sv #import imblearn.datasets as imbd from imblearn.datasets import fetch_datasets
datasets = fetch_datasets(filter_data=['oil']) X, y = datasets['oil']['data'], datasets['oil']['target'] [print('Class {} has {} instances'.format(label, count)) for label, count in zip(*np.unique(y, return_counts=True))]
oversampler= sv.SOMO() X_samp, y_samp= oversampler.sample(X, y)
[print('Class {} has {} instances after oversampling'.format(label, count)) for label, count in zip(*np.unique(y_samp, return_counts=True))] print(X_samp, y_samp)
and the print result : Class -1 has 896 instances Class 1 has 41 instances Class -1 has 896 instances after oversampling Class 1 has 41 instances after oversampling After oversampling, There is no change in the number of two types of samples.