fastexcel icon indicating copy to clipboard operation
fastexcel copied to clipboard

29.02 cell cast to 29.020000000000003 string

Open severinh opened this issue 5 months ago • 7 comments

Steps to reproduce

See the Excel file decimal-numbers.xlsx. It contains a column with two decimals:

Decimals
28.14
29.02

Read the Excel file using fastexcel:

read_excel("decimal-numbers.xlsx").load_sheet_by_name("Sheet1").to_polars()
shape: (2, 1)
┌──────────┐
│ Decimals │
│ ---      │
│ f64      │
╞══════════╡
│ 28.14    │
│ 29.02    │
└──────────┘

Looks fine.

Then read the Excel file while casting to strings:

read_excel("decimal-numbers.xlsx").load_sheet_by_name("Sheet1", dtypes={0: 'string'}).to_polars()
shape: (2, 1)
┌────────────────────┐
│ Decimals           │
│ ---                │
│ str                │
╞════════════════════╡
│ 28.14              │
│ 29.020000000000003 │
└────────────────────┘

The expected result was that the strings would be the same as shown to the user in the Excel file:

shape: (2, 1)
┌──────────┐
│ Decimals │
│ ---      │
│ str      │
╞══════════╡
│ 28.14    │
│ 29.02    │
└──────────┘

Wrap-up

I understand this looks like an issue due to floating point precision, and I'm not sure if this:

  1. could be fixed in fastexcel
  2. could be fixed in the underlying calamine.
  3. cannot be fixed at all, since it's a fundamental property of the Excel file format or parsing process.

What's the motivation for filing this bug: In our system, we have highly heterogeneous data, so we have to read all Excel values as strings. However, if users see 29.02 in their Excel files, but 29.020000000000003 in our system, that's highly confusing and surprising to users.

What do you think?

severinh avatar Sep 27 '24 10:09 severinh