`use_columns` silently skips empty columns at the beginning
Given the following spreadsheet with an empty first column:
Loading the sheet with e.g. use_columns='B:C' (or use_columns=[1, 2]) actually selects columns C and D:
read_excel('sheet.xlsx').load_sheet(0, use_columns='B:C').to_polars()
shape: (3, 2)
┌────────┬───────┐
│ Second ┆ Third │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞════════╪═══════╡
│ 4.0 ┆ 7.0 │
│ 5.0 ┆ 8.0 │
│ 6.0 ┆ 9.0 │
└────────┴───────┘
And raises ColumnNotFoundError with use_columns='B:D' or use_columns=[1, 2, 3]:
read_excel('sheet.xlsx').load_sheet(0, use_columns='B:D').to_polars()
_fastexcel.ColumnNotFoundError: column at index 3 not found
Context:
0: available columns are: [ColumnInfo { name: "First", index: 0, dtype: Float, column_name_from: LookedUp, dtype_from: Guessed }, ColumnInfo { name: "Second", index: 1, dtype: Float, column_name_from: LookedUp, dtype_from: Guessed }, ColumnInfo { name: "Third", index: 2, dtype: Float, column_name_from: LookedUp, dtype_from: Guessed }]
Yes the engine behind fastexcel calamine skips empty columns by default.
I plan to add an option on calamine directly like I did for skipping empty rows
Yes the engine behind fastexcel
calamineskips empty columns by default. I plan to add an option oncalaminedirectly like I did for skipping empty rows
@PrettyWood: If you add it anytime soon I'll hook into it from Polars, as we have a similar request on our side now, and I'm adding a drop_empty_cols param to influence this behaviour (where the underlying engine has support) :)
+1
+1
Would this change be a lot of work?
I think this qualifies as a real bug when asking for columns B and C yields columns C and D... Please allow load_sheet to set the underlying skip_empty_area parameter to false.
Quoting the python-calamine README :
By default, calamine skips empty rows/cols before data. For suppress this behaviour, set skip_empty_area to False.