smrt
smrt copied to clipboard
a little mistake in the code
https://github.com/tgsmith61591/smrt/blob/0863b0b94897ad8d8d9184b792cdb275627e1ac4/smrt/balance/smrt.py#L207
I think this should be X_sub = X_copy[y_transform == transformed_label, :]
Thanks for taking the time to file an issue. I think you're right. Feel free to file a PR, if you like. We haven't been working on this project in a while, but I have another project that built on this balancing codebase, should you be interested in using it: https://github.com/tgsmith61591/skoot
Thanks for your reply. In fact, I have met a problem about the imbalabced dataset recently. My dataset is so imbalanced that the variance of the Gaussian distribution is too low and the distribution is almost the same as Dirac delta function. we compare the image nums to the lables as following.
the x-axis represents the num of images in one id and the y-axis represents the corresponding labels' num. So I would like to ask you a double of questions:
Do you think it is a good way to use SMOTE to balance this kind of dataset?
And can you recommend me some efffiencient strategies to handle such problem?
Thanks very much!
The original SMOTE paper got its biggest performance boost by combining down-sampling with their method. I think it would absolutely be worth trying such a method. Things like:
- Downsample
- Perform SMOTE
- Consider stratified batching (if you're using a minibatch family of algorithms)
- Explore class-weighted loss functions
The skoot package I shared can help you downsample and perform SMOTE, but the other strategies will depend on your framework and family of algorithms.
Thanks for your help.
In fact I have tried other 3 methods except SMOTE, but still can not improve the performance. Since my dataset is 2D image of human bodies in different poses and SMOTE augment the dataset by linear interpolation, I was wondering this method may generate misleading images and did not consider it in the first place. But now maybe it is the only one I can count on......