dtype_diet
dtype_diet copied to clipboard
Consider sparse data types
Pandas' sparse data structures are another handy-looking memory saving trick that fits with the theme of dtype_diet. It'd be nice if the tool considered it as an option.
The simple case would be to try a sparse column with NaN as the "omitted" value (or perhaps zero for dtypes that lack NaNs).
To get a bit more complex, Pandas lets you can choose any value, and a slightly better trick might be to use the most common value in the column as the "omitted" value. However, that might result in some silly suggestions. For example, suggesting that a column with values [1, 2, 2, 3] be made sparse by omitting '2' isn't really a great suggestion if '2' is only most common for the particular piece of example data being analysed.