imbalanced-learn
imbalanced-learn copied to clipboard
[ENH] Have a subset of sampler enabling sampling in large dataset
We got a couple of issues, notably with SMOTENC
, where large datasets drive to a MemoryError
.
Here I will add a couple of points that could be addressed in the future:
- [ ] Check in the class
SMOTENC
if converting a dataset from sparse to dense is required (#752, #768, #688, #667) - [ ] A subset of the sampler could be implemented in Dask. We should probably prototype in
imblearn
before to contribute it upstream (https://github.com/dask/dask-ml/issues/317)
Was there any progress on this issue? It remains a breaking problem on large datasets for the current release.