imbalanced-learn icon indicating copy to clipboard operation
imbalanced-learn copied to clipboard

model hyperparameters be adjusted before and after oversampling?

Open sshenbao opened this issue 1 year ago • 1 comments

** If your issue is a usage question, submit it here instead:

  • The imbalanced learn gitter: https://gitter.im/scikit-learn-contrib/imbalanced-learn ** If we want to see if oversampling has an effect, should our model hyperparameters be adjusted before and after oversampling?

sshenbao avatar Jul 09 '24 12:07 sshenbao

Yes. When determining if oversampling has an effect, it's best to consider model hyperparams before and after oversampling.

If you trained the model before oversampling, you will optimize the model on the original dataset.

When you apply oversampling, the data distribution changes and that will affect the optimal hyperparameters. You will have to re-tune the hyperparameters.

Bokang-ctrl avatar Jul 13 '24 12:07 Bokang-ctrl

Closing since this is more a question.

Most probably, one should tune the threshold of the classifier instead of playing with oversampling. The following estimator from scikit-learn will be helpful: https://scikit-learn.org/1.5/modules/generated/sklearn.model_selection.TunedThresholdClassifierCV.html

glemaitre avatar Oct 04 '24 16:10 glemaitre