smrt icon indicating copy to clipboard operation
smrt copied to clipboard

a little mistake in the code

Open gkyustc opened this issue 5 years ago • 4 comments

https://github.com/tgsmith61591/smrt/blob/0863b0b94897ad8d8d9184b792cdb275627e1ac4/smrt/balance/smrt.py#L207

I think this should be X_sub = X_copy[y_transform == transformed_label, :]

gkyustc avatar Nov 26 '19 03:11 gkyustc

Thanks for taking the time to file an issue. I think you're right. Feel free to file a PR, if you like. We haven't been working on this project in a while, but I have another project that built on this balancing codebase, should you be interested in using it: https://github.com/tgsmith61591/skoot

tgsmith61591 avatar Nov 26 '19 12:11 tgsmith61591

Thanks for your reply. In fact, I have met a problem about the imbalabced dataset recently. My dataset is so imbalanced that the variance of the Gaussian distribution is too low and the distribution is almost the same as Dirac delta function. we compare the image nums to the lables as following. image the x-axis represents the num of images in one id and the y-axis represents the corresponding labels' num. So I would like to ask you a double of questions: Do you think it is a good way to use SMOTE to balance this kind of dataset? And can you recommend me some efffiencient strategies to handle such problem? Thanks very much!

gkyustc avatar Nov 26 '19 12:11 gkyustc

The original SMOTE paper got its biggest performance boost by combining down-sampling with their method. I think it would absolutely be worth trying such a method. Things like:

  • Downsample
  • Perform SMOTE
  • Consider stratified batching (if you're using a minibatch family of algorithms)
  • Explore class-weighted loss functions

The skoot package I shared can help you downsample and perform SMOTE, but the other strategies will depend on your framework and family of algorithms.

tgsmith61591 avatar Nov 26 '19 13:11 tgsmith61591

Thanks for your help.
In fact I have tried other 3 methods except SMOTE, but still can not improve the performance. Since my dataset is 2D image of human bodies in different poses and SMOTE augment the dataset by linear interpolation, I was wondering this method may generate misleading images and did not consider it in the first place. But now maybe it is the only one I can count on......

gkyustc avatar Nov 26 '19 14:11 gkyustc