imbalanced-learn
imbalanced-learn copied to clipboard
Pipeline performs SMOTE both over train and validation sets
I have been using imblearn Pipeline to apply SMOTE, but I have realized that it is sampling both the train and validation sets. I get the same results when I apply Pipeline and when instead of applying Pipeline I perform sampling over the train and validation sets and then train my xgboost model.
I believe what is happening is that imblearn pipeline passes through those transformers whose method is "fit_resample" while SMOTE method name is fit_sample, leading the Pipeline not to passthrough SMOTE and sampling also over the validation set.
Any ideas about this?
It appears to me that the SMOTE class does have fit_resample
, not fit_sample
. Can you provide a minimal example that shows the validation set being resampled?
fit_resample
and fit_sample
are just some alias.
@Marinaobdulia could you provide a minimal example. My insights there would be that XGBoost does not provide a fully compatible scikit-learn model (that does pass the check_estimator
) and thus our pipeline does not work as expected.
However, an example could allow to check why this is the case and if we can do something in imbalanced-learn and or maybe propose a fix upstream.
closing since we don't have additional information.