dtype_diet icon indicating copy to clipboard operation
dtype_diet copied to clipboard

Consider sparse data types

Open DanielFEvans opened this issue 4 years ago • 0 comments

Pandas' sparse data structures are another handy-looking memory saving trick that fits with the theme of dtype_diet. It'd be nice if the tool considered it as an option.

The simple case would be to try a sparse column with NaN as the "omitted" value (or perhaps zero for dtypes that lack NaNs).

To get a bit more complex, Pandas lets you can choose any value, and a slightly better trick might be to use the most common value in the column as the "omitted" value. However, that might result in some silly suggestions. For example, suggesting that a column with values [1, 2, 2, 3] be made sparse by omitting '2' isn't really a great suggestion if '2' is only most common for the particular piece of example data being analysed.

DanielFEvans avatar Aug 27 '20 13:08 DanielFEvans