imbalanced-learn
imbalanced-learn copied to clipboard
[ENH] Have a subset of sampler enabling sampling in large dataset
We got a couple of issues, notably with SMOTENC, where large datasets drive to a MemoryError.
Here I will add a couple of points that could be addressed in the future:
- [ ] Check in the class
SMOTENCif converting a dataset from sparse to dense is required (#752, #768, #688, #667) - [ ] A subset of the sampler could be implemented in Dask. We should probably prototype in
imblearnbefore to contribute it upstream (https://github.com/dask/dask-ml/issues/317)
Was there any progress on this issue? It remains a breaking problem on large datasets for the current release.