fastexcel
fastexcel copied to clipboard
29.02 cell cast to 29.020000000000003 string
Steps to reproduce
See the Excel file decimal-numbers.xlsx. It contains a column with two decimals:
Decimals |
---|
28.14 |
29.02 |
Read the Excel file using fastexcel:
read_excel("decimal-numbers.xlsx").load_sheet_by_name("Sheet1").to_polars()
shape: (2, 1)
┌──────────┐
│ Decimals │
│ --- │
│ f64 │
╞══════════╡
│ 28.14 │
│ 29.02 │
└──────────┘
Looks fine.
Then read the Excel file while casting to strings:
read_excel("decimal-numbers.xlsx").load_sheet_by_name("Sheet1", dtypes={0: 'string'}).to_polars()
shape: (2, 1)
┌────────────────────┐
│ Decimals │
│ --- │
│ str │
╞════════════════════╡
│ 28.14 │
│ 29.020000000000003 │
└────────────────────┘
The expected result was that the strings would be the same as shown to the user in the Excel file:
shape: (2, 1)
┌──────────┐
│ Decimals │
│ --- │
│ str │
╞══════════╡
│ 28.14 │
│ 29.02 │
└──────────┘
Wrap-up
I understand this looks like an issue due to floating point precision, and I'm not sure if this:
- could be fixed in fastexcel
- could be fixed in the underlying calamine.
- cannot be fixed at all, since it's a fundamental property of the Excel file format or parsing process.
What's the motivation for filing this bug: In our system, we have highly heterogeneous data, so we have to read all Excel values as strings. However, if users see 29.02 in their Excel files, but 29.020000000000003
in our system, that's highly confusing and surprising to users.
What do you think?