auto-sklearn icon indicating copy to clipboard operation
auto-sklearn copied to clipboard

No support for precision reduction when reducing dataset size for pandas dataframe or series.

Open eddiebergman opened this issue 4 years ago • 0 comments

We currently have two methods for dataset size reduction, precision and subsample, introduced more clearly in PR #1250. However we have not implemented precision reduction with pandas dataframes as this is a bit more involved, when compared to the fact ndarray's have a uniform type while dataframes ahave a type per column.

We also can not use reduce_dataset_size_if_too_large with dataframes yet as we have not implemented a method to calculate it's size, such that we know how much to subsample.

This shouldn't be too hard to implement but will require updating tests as well.

Edit: Just adding an extra point to include more nuanced calculation for spare matrices. arr.data.nbytes + arr.indices.nbytes + arr.indptr.nbytes

eddiebergman avatar Nov 03 '21 11:11 eddiebergman