auto-sklearn
auto-sklearn copied to clipboard
No support for precision reduction when reducing dataset size for pandas dataframe or series.
We currently have two methods for dataset size reduction, precision and subsample, introduced more clearly in PR #1250. However we have not implemented precision reduction with pandas dataframes as this is a bit more involved, when compared to the fact ndarray's have a uniform type while dataframes ahave a type per column.
We also can not use reduce_dataset_size_if_too_large with dataframes yet as we have not implemented a method to calculate it's size, such that we know how much to subsample.
This shouldn't be too hard to implement but will require updating tests as well.
Edit:
Just adding an extra point to include more nuanced calculation for spare matrices.
arr.data.nbytes + arr.indices.nbytes + arr.indptr.nbytes