imbalanced-learn icon indicating copy to clipboard operation
imbalanced-learn copied to clipboard

[ENH] Have a subset of sampler enabling sampling in large dataset

Open glemaitre opened this issue 4 years ago • 1 comments

We got a couple of issues, notably with SMOTENC, where large datasets drive to a MemoryError.

Here I will add a couple of points that could be addressed in the future:

  • [ ] Check in the class SMOTENC if converting a dataset from sparse to dense is required (#752, #768, #688, #667)
  • [ ] A subset of the sampler could be implemented in Dask. We should probably prototype in imblearn before to contribute it upstream (https://github.com/dask/dask-ml/issues/317)

glemaitre avatar Nov 02 '20 10:11 glemaitre

Was there any progress on this issue? It remains a breaking problem on large datasets for the current release.

arisingh8 avatar Jun 14 '23 02:06 arisingh8