fastexcel
fastexcel copied to clipboard
Option to retain special symbols/ errors in excel as a string
Hi team, in excel, we have special symbols like #VALUE!, #DIV/0! etc. As of now, these symbols are turned into null. Can we have an option to retain them as it is as a string?
Thank you for the wonderful and performant wrapper.
https://github.com/ToucanToco/fastexcel/issues/220#issuecomment-2042627835 for reference
Hello @durgeksh I pushed https://github.com/ToucanToco/fastexcel/pull/293 to showcase the current behavior, which seems ok to me. If it's not can you please share a file and explain the current behavior and the desired one?
Hello @PrettyWood , I've gone over the tests pushed in #293 and I'm unable to replicate the desired behaviour. I can't find a way to read excel error codes as strings - they are always transcribed as null
.
My environment:
Linux kernel 6.8.0 on x86-64
Ubuntu 24.04.1 LGS
In the following tables, the column 'from_m' has 5 broken formulas, so the error code for each is #REF!
. The column litho_unit
is empty, so its null
values are expected and desired.
Herein lies the problem: if I'm unable to display and query for excel error codes, then I can't differentiate true null
values from those null
s that are hiding broken excel functionality. Information is being lost.
Note that all columns in the source excel sheet are nominally 'text' type; that is for workflow reasons beyond my control. Regardless, the source column type should not impact the display of Excel error codes: in this case, the columns should be brought in as strings, and the five cells with broken formulas should display #REF!
in the dataframe.
Apologies for the long format, but I want to show my 'tests' in full:
$ python --version [127:NOTFND]|ARIS-Scripts|off|.aris|ref_error_handling
Python 3.12.6
$ uv pip show polars fastexcel [0]|ARIS-Scripts|off|.aris|ref_error_handling
Using Python 3.12.6 environment at .aris
Name: fastexcel
Version: 0.12.0
Location: /path/to/venv
Requires: pyarrow
Required-by:
---
Name: polars
Version: 1.14.0
Location: /path/to/venv
Requires:
Required-by:
$ python [0]|ARIS-Scripts|off|.aris|ref_error_handling
Python 3.12.6 (main, Sep 9 2024, 22:11:19) [Clang 18.1.8 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import polars as pl
>>> partial_path = "/path/to/30061_test_broken_formula_partial_column_litho.xlsx"
>>> df = pl.read_excel(partial_path, read_options={'schema_sample_rows':30,})
>>> with pl.Config(set_tbl_rows=30): print(df)
shape: (29, 5)
┌───────────┬────────┬───────┬────────────┬─────────────────────────────────┐
│ hole_name ┆ from_m ┆ to_m ┆ litho_unit ┆ comments │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ null ┆ str │
╞═══════════╪════════╪═══════╪════════════╪═════════════════════════════════╡
│ S07-01 ┆ 0.0 ┆ 2.2 ┆ null ┆ overburden │
│ S07-01 ┆ 2.2 ┆ 150.0 ┆ null ┆ Dacitic Ash Flow Tuff (TS) │
│ S07-01 ┆ 2.2 ┆ 3.0 ┆ null ┆ Csg. Fresh Dacitic Ash Flow Tu… │
│ S07-01 ┆ 3.0 ┆ 4.0 ┆ null ┆ Fresh daft. Bockv fracturino │
│ S07-01 ┆ 4.0 ┆ 4.15 ┆ null ┆ 15cm of coarse 3.0-5.0mm feld … │
│ S07-01 ┆ 5.3 ┆ 6.0 ┆ null ┆ around core - no record │
│ S07-01 ┆ 6.55 ┆ 6.95 ┆ null ┆ Silty sand- type material reco… │
│ S07-01 ┆ 7.0 ┆ 7.5 ┆ null ┆ Bleached, altered dacite, no s… │
│ S07-01 ┆ 8.0 ┆ 9.0 ┆ null ┆ fresh blocky fracture TS │
│ S07-01 ┆ 9.9 ┆ 10.0 ┆ null ┆ Mod. Clav altered TS one thin … │
│ S07-01 ┆ null ┆ 11.0 ┆ null ┆ Mod clay altered TS Blocky br… │
│ S07-01 ┆ null ┆ 12.0 ┆ null ┆ very weekly altered TS broken … │
│ S07-01 ┆ null ┆ 13.8 ┆ null ┆ 12.0-13.8mostly gr. And lost g… │
│ S07-01 ┆ null ┆ 14.7 ┆ null ┆ Mod clay altered zone with qua… │
│ S07-01 ┆ null ┆ 16.0 ┆ null ┆ wk-mod clay altered TS │
│ S07-01 ┆ 16.0 ┆ 17.0 ┆ null ┆ wk clay altered TS │
│ S07-01 ┆ 18.2 ┆ 18.3 ┆ null ┆ Cal healino bx TX at 20-30° C… │
│ S07-01 ┆ 19.1 ┆ 19.95 ┆ null ┆ Very intensely clay altered TS… │
│ S07-01 ┆ 20.0 ┆ 21.0 ┆ null ┆ fresh to very weakly clay alte… │
│ S07-01 ┆ 21.1 ┆ 21.95 ┆ null ┆ Highly broken crumbly core, mo… │
│ S07-01 ┆ 22.0 ┆ 23.0 ┆ null ┆ Fresh to v. weak altered TS │
│ S07-01 ┆ 23.0 ┆ 24.0 ┆ null ┆ verv uniform unalterd TS │
│ S07-01 ┆ 24.0 ┆ 28.0 ┆ null ┆ uniform unaltered TS │
│ S07-01 ┆ 28.0 ┆ 29.0 ┆ null ┆ rock is weaklv oxidized. oranq… │
│ S07-01 ┆ 29.1 ┆ 29.4 ┆ null ┆ finer grained band of tuff in … │
│ S07-01 ┆ 31.05 ┆ 32.20 ┆ null ┆ Several lim coated fractures a… │
│ S07-01 ┆ 34.5 ┆ 35.0 ┆ null ┆ major fault - gg and soft dk g… │
│ S07-01 ┆ 35.0 ┆ 36.0 ┆ null ┆ soft partly bx, fr and altd fo… │
│ S07-01 ┆ 36.0 ┆ 37.15 ┆ null ┆ Perv K-spr altd zone with qtz-… │
└───────────┴────────┴───────┴────────────┴─────────────────────────────────┘
>>> df = pl.read_excel(partial_path, read_options={'schema_sample_rows':1,'dtypes':'string'})
>>> with pl.Config(set_tbl_rows=30): print(df)
shape: (29, 5)
┌───────────┬────────┬───────┬────────────┬─────────────────────────────────┐
│ hole_name ┆ from_m ┆ to_m ┆ litho_unit ┆ comments │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ str ┆ str │
╞═══════════╪════════╪═══════╪════════════╪═════════════════════════════════╡
│ S07-01 ┆ 0.0 ┆ 2.2 ┆ null ┆ overburden │
│ S07-01 ┆ 2.2 ┆ 150.0 ┆ null ┆ Dacitic Ash Flow Tuff (TS) │
│ S07-01 ┆ 2.2 ┆ 3.0 ┆ null ┆ Csg. Fresh Dacitic Ash Flow Tu… │
│ S07-01 ┆ 3.0 ┆ 4.0 ┆ null ┆ Fresh daft. Bockv fracturino │
│ S07-01 ┆ 4.0 ┆ 4.15 ┆ null ┆ 15cm of coarse 3.0-5.0mm feld … │
│ S07-01 ┆ 5.3 ┆ 6.0 ┆ null ┆ around core - no record │
│ S07-01 ┆ 6.55 ┆ 6.95 ┆ null ┆ Silty sand- type material reco… │
│ S07-01 ┆ 7.0 ┆ 7.5 ┆ null ┆ Bleached, altered dacite, no s… │
│ S07-01 ┆ 8.0 ┆ 9.0 ┆ null ┆ fresh blocky fracture TS │
│ S07-01 ┆ 9.9 ┆ 10.0 ┆ null ┆ Mod. Clav altered TS one thin … │
│ S07-01 ┆ null ┆ 11.0 ┆ null ┆ Mod clay altered TS Blocky br… │
│ S07-01 ┆ null ┆ 12.0 ┆ null ┆ very weekly altered TS broken … │
│ S07-01 ┆ null ┆ 13.8 ┆ null ┆ 12.0-13.8mostly gr. And lost g… │
│ S07-01 ┆ null ┆ 14.7 ┆ null ┆ Mod clay altered zone with qua… │
│ S07-01 ┆ null ┆ 16.0 ┆ null ┆ wk-mod clay altered TS │
│ S07-01 ┆ 16.0 ┆ 17.0 ┆ null ┆ wk clay altered TS │
│ S07-01 ┆ 18.2 ┆ 18.3 ┆ null ┆ Cal healino bx TX at 20-30° C… │
│ S07-01 ┆ 19.1 ┆ 19.95 ┆ null ┆ Very intensely clay altered TS… │
│ S07-01 ┆ 20.0 ┆ 21.0 ┆ null ┆ fresh to very weakly clay alte… │
│ S07-01 ┆ 21.1 ┆ 21.95 ┆ null ┆ Highly broken crumbly core, mo… │
│ S07-01 ┆ 22.0 ┆ 23.0 ┆ null ┆ Fresh to v. weak altered TS │
│ S07-01 ┆ 23.0 ┆ 24.0 ┆ null ┆ verv uniform unalterd TS │
│ S07-01 ┆ 24.0 ┆ 28.0 ┆ null ┆ uniform unaltered TS │
│ S07-01 ┆ 28.0 ┆ 29.0 ┆ null ┆ rock is weaklv oxidized. oranq… │
│ S07-01 ┆ 29.1 ┆ 29.4 ┆ null ┆ finer grained band of tuff in … │
│ S07-01 ┆ 31.05 ┆ 32.20 ┆ null ┆ Several lim coated fractures a… │
│ S07-01 ┆ 34.5 ┆ 35.0 ┆ null ┆ major fault - gg and soft dk g… │
│ S07-01 ┆ 35.0 ┆ 36.0 ┆ null ┆ soft partly bx, fr and altd fo… │
│ S07-01 ┆ 36.0 ┆ 37.15 ┆ null ┆ Perv K-spr altd zone with qtz-… │
└───────────┴────────┴───────┴────────────┴─────────────────────────────────┘
>>> df = pl.read_excel(partial_path, read_options={'schema_sample_rows':1,'dtypes':{'from_m':'string'}})
>>> with pl.Config(set_tbl_rows=30): print(df)
shape: (29, 5)
┌───────────┬────────┬───────┬────────────┬─────────────────────────────────┐
│ hole_name ┆ from_m ┆ to_m ┆ litho_unit ┆ comments │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ null ┆ str │
╞═══════════╪════════╪═══════╪════════════╪═════════════════════════════════╡
│ S07-01 ┆ 0.0 ┆ 2.2 ┆ null ┆ overburden │
│ S07-01 ┆ 2.2 ┆ 150.0 ┆ null ┆ Dacitic Ash Flow Tuff (TS) │
│ S07-01 ┆ 2.2 ┆ 3.0 ┆ null ┆ Csg. Fresh Dacitic Ash Flow Tu… │
│ S07-01 ┆ 3.0 ┆ 4.0 ┆ null ┆ Fresh daft. Bockv fracturino │
│ S07-01 ┆ 4.0 ┆ 4.15 ┆ null ┆ 15cm of coarse 3.0-5.0mm feld … │
│ S07-01 ┆ 5.3 ┆ 6.0 ┆ null ┆ around core - no record │
│ S07-01 ┆ 6.55 ┆ 6.95 ┆ null ┆ Silty sand- type material reco… │
│ S07-01 ┆ 7.0 ┆ 7.5 ┆ null ┆ Bleached, altered dacite, no s… │
│ S07-01 ┆ 8.0 ┆ 9.0 ┆ null ┆ fresh blocky fracture TS │
│ S07-01 ┆ 9.9 ┆ 10.0 ┆ null ┆ Mod. Clav altered TS one thin … │
│ S07-01 ┆ null ┆ 11.0 ┆ null ┆ Mod clay altered TS Blocky br… │
│ S07-01 ┆ null ┆ 12.0 ┆ null ┆ very weekly altered TS broken … │
│ S07-01 ┆ null ┆ 13.8 ┆ null ┆ 12.0-13.8mostly gr. And lost g… │
│ S07-01 ┆ null ┆ 14.7 ┆ null ┆ Mod clay altered zone with qua… │
│ S07-01 ┆ null ┆ 16.0 ┆ null ┆ wk-mod clay altered TS │
│ S07-01 ┆ 16.0 ┆ 17.0 ┆ null ┆ wk clay altered TS │
│ S07-01 ┆ 18.2 ┆ 18.3 ┆ null ┆ Cal healino bx TX at 20-30° C… │
│ S07-01 ┆ 19.1 ┆ 19.95 ┆ null ┆ Very intensely clay altered TS… │
│ S07-01 ┆ 20.0 ┆ 21.0 ┆ null ┆ fresh to very weakly clay alte… │
│ S07-01 ┆ 21.1 ┆ 21.95 ┆ null ┆ Highly broken crumbly core, mo… │
│ S07-01 ┆ 22.0 ┆ 23.0 ┆ null ┆ Fresh to v. weak altered TS │
│ S07-01 ┆ 23.0 ┆ 24.0 ┆ null ┆ verv uniform unalterd TS │
│ S07-01 ┆ 24.0 ┆ 28.0 ┆ null ┆ uniform unaltered TS │
│ S07-01 ┆ 28.0 ┆ 29.0 ┆ null ┆ rock is weaklv oxidized. oranq… │
│ S07-01 ┆ 29.1 ┆ 29.4 ┆ null ┆ finer grained band of tuff in … │
│ S07-01 ┆ 31.05 ┆ 32.20 ┆ null ┆ Several lim coated fractures a… │
│ S07-01 ┆ 34.5 ┆ 35.0 ┆ null ┆ major fault - gg and soft dk g… │
│ S07-01 ┆ 35.0 ┆ 36.0 ┆ null ┆ soft partly bx, fr and altd fo… │
│ S07-01 ┆ 36.0 ┆ 37.15 ┆ null ┆ Perv K-spr altd zone with qtz-… │
└───────────┴────────┴───────┴────────────┴─────────────────────────────────┘
>>> df = pl.read_excel(partial_path, read_options={'schema_sample_rows':30,'dtypes':{'from_m':'string'}})
>>> with pl.Config(set_tbl_rows=30): print(df)
shape: (29, 5)
┌───────────┬────────┬───────┬────────────┬─────────────────────────────────┐
│ hole_name ┆ from_m ┆ to_m ┆ litho_unit ┆ comments │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ null ┆ str │
╞═══════════╪════════╪═══════╪════════════╪═════════════════════════════════╡
│ S07-01 ┆ 0.0 ┆ 2.2 ┆ null ┆ overburden │
│ S07-01 ┆ 2.2 ┆ 150.0 ┆ null ┆ Dacitic Ash Flow Tuff (TS) │
│ S07-01 ┆ 2.2 ┆ 3.0 ┆ null ┆ Csg. Fresh Dacitic Ash Flow Tu… │
│ S07-01 ┆ 3.0 ┆ 4.0 ┆ null ┆ Fresh daft. Bockv fracturino │
│ S07-01 ┆ 4.0 ┆ 4.15 ┆ null ┆ 15cm of coarse 3.0-5.0mm feld … │
│ S07-01 ┆ 5.3 ┆ 6.0 ┆ null ┆ around core - no record │
│ S07-01 ┆ 6.55 ┆ 6.95 ┆ null ┆ Silty sand- type material reco… │
│ S07-01 ┆ 7.0 ┆ 7.5 ┆ null ┆ Bleached, altered dacite, no s… │
│ S07-01 ┆ 8.0 ┆ 9.0 ┆ null ┆ fresh blocky fracture TS │
│ S07-01 ┆ 9.9 ┆ 10.0 ┆ null ┆ Mod. Clav altered TS one thin … │
│ S07-01 ┆ null ┆ 11.0 ┆ null ┆ Mod clay altered TS Blocky br… │
│ S07-01 ┆ null ┆ 12.0 ┆ null ┆ very weekly altered TS broken … │
│ S07-01 ┆ null ┆ 13.8 ┆ null ┆ 12.0-13.8mostly gr. And lost g… │
│ S07-01 ┆ null ┆ 14.7 ┆ null ┆ Mod clay altered zone with qua… │
│ S07-01 ┆ null ┆ 16.0 ┆ null ┆ wk-mod clay altered TS │
│ S07-01 ┆ 16.0 ┆ 17.0 ┆ null ┆ wk clay altered TS │
│ S07-01 ┆ 18.2 ┆ 18.3 ┆ null ┆ Cal healino bx TX at 20-30° C… │
│ S07-01 ┆ 19.1 ┆ 19.95 ┆ null ┆ Very intensely clay altered TS… │
│ S07-01 ┆ 20.0 ┆ 21.0 ┆ null ┆ fresh to very weakly clay alte… │
│ S07-01 ┆ 21.1 ┆ 21.95 ┆ null ┆ Highly broken crumbly core, mo… │
│ S07-01 ┆ 22.0 ┆ 23.0 ┆ null ┆ Fresh to v. weak altered TS │
│ S07-01 ┆ 23.0 ┆ 24.0 ┆ null ┆ verv uniform unalterd TS │
│ S07-01 ┆ 24.0 ┆ 28.0 ┆ null ┆ uniform unaltered TS │
│ S07-01 ┆ 28.0 ┆ 29.0 ┆ null ┆ rock is weaklv oxidized. oranq… │
│ S07-01 ┆ 29.1 ┆ 29.4 ┆ null ┆ finer grained band of tuff in … │
│ S07-01 ┆ 31.05 ┆ 32.20 ┆ null ┆ Several lim coated fractures a… │
│ S07-01 ┆ 34.5 ┆ 35.0 ┆ null ┆ major fault - gg and soft dk g… │
│ S07-01 ┆ 35.0 ┆ 36.0 ┆ null ┆ soft partly bx, fr and altd fo… │
│ S07-01 ┆ 36.0 ┆ 37.15 ┆ null ┆ Perv K-spr altd zone with qtz-… │
└───────────┴────────┴───────┴────────────┴─────────────────────────────────┘