pandas2
pandas2 copied to clipboard
dtype precision / conversions
this may not actually be an issue as we aren't using float np.nan
as our missing marker, but
we tend to have some subtle issues when int64 are downcast to float64, IOW we have missing values in an integer array. We end up storing them as object
to avoid this precision loss.
Just a reminder to test for things like this.
xref https://github.com/pydata/pandas/issues/14020 as an example
Missing data uniformity and removing all the implicit type casting is definitely a top 5 priority from my POV. Not being able to exchange data with file formats and databases with high fidelity (e.g. integer->float casting with values over 2^53 actually loses data) is a serious problem for production use as an ETL / data engineering tool.