dtype_diet
dtype_diet copied to clipboard
Tries to shrink your Pandas column dtypes with no data loss so you have more spare RAM
Fix typo in `convertsion` to read `conversion` in output string. See below: **Current** ``` Smallest non-breaking converstion per column: ... ``` **Fix** ``` Smallest non-breaking conversion per column: ... ```
Pandas' [sparse data structures](https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html) are another handy-looking memory saving trick that fits with the theme of dtype_diet. It'd be nice if the tool considered it as an option. The simple...
When I look at the report output, my first thought (particularly with numeric types) is, what if the data changes a bit? It might be useful to give an idea...
Hi Ian, Thanks for putting this together, I had never really thought about this but it makes a lot of sense. I was looking at the example notebook and trying...
Some experiments suggest (e.g. companies house raw data) that a parquet file is bigger than a pickled zip and that it loads back in slower (e.g. 30% slower). Do a...
Update readme or file new bugs based on: * https://twitter.com/RokoMijicUK/status/1267353562247573504 (int8 -> float64 if NaN on a join? so Int8 or nullable bool?) * https://twitter.com/sardinan_guy/status/1267279292003647488 (f16 not in parquet?) *...
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.infer_objects.html?highlight=infer_objects#pandas.DataFrame.infer_objects The above identifies e.g. int64 in a list that was setup as an object. Does it do the same for nullable dtypes?
While pandas supports nullable ints via extension arrays, they are still not the default when reading data in. So you can easily get float64 for nullable int series, so you...