dtype_diet issues

Fix typo in `dtype_diet.report_on_dataframe()` output

Fix typo in `convertsion` to read `conversion` in output string. See below: **Current** ``` Smallest non-breaking converstion per column: ... ``` **Fix** ``` Smallest non-breaking conversion per column: ... ```

mharrisb1

Consider sparse data types

Pandas' [sparse data structures](https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html) are another handy-looking memory saving trick that fits with the theme of dtype_diet. It'd be nice if the tool considered it as an option. The simple...

DanielFEvans

Report on 'margins'?

2

When I look at the report output, my first thought (particularly with numeric types) is, what if the data changes a bit? It might be useful to give an idea...

amilbourne

Sumarise data in example notebook?

Hi Ian, Thanks for putting this together, I had never really thought about this but it makes a lot of sense. I was looking at the example notebook and trying...

amilbourne

pickle vs parquet?

Some experiments suggest (e.g. companies house raw data) that a parquet file is bigger than a pickled zip and that it loads back in slower (e.g. 30% slower). Do a...

ianozsvald

Tweets/ideas to review

Update readme or file new bugs based on: * https://twitter.com/RokoMijicUK/status/1267353562247573504 (int8 -> float64 if NaN on a join? so Int8 or nullable bool?) * https://twitter.com/sardinan_guy/status/1267279292003647488 (f16 not in parquet?) *...

ianozsvald

Might infer_objects be useful?

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.infer_objects.html?highlight=infer_objects#pandas.DataFrame.infer_objects The above identifies e.g. int64 in a list that was setup as an object. Does it do the same for nullable dtypes?

ianozsvald

nullable ints interpreted as floats

1

While pandas supports nullable ints via extension arrays, they are still not the default when reading data in. So you can easily get float64 for nullable int series, so you...

kokes

dtype_diet
dtype_diet copied to clipboard

Metadata

Fix typo in `dtype_diet.report_on_dataframe()` output

Consider sparse data types

Report on 'margins'?

Sumarise data in example notebook?

pickle vs parquet?

Tweets/ideas to review

Might infer_objects be useful?

nullable ints interpreted as floats

← Metadata

Owner

Metadata

dtype_diet dtype_diet copied to clipboard

Metadata

← Metadata

Owner

Metadata

dtype_diet
dtype_diet copied to clipboard