duckdb_spatial icon indicating copy to clipboard operation
duckdb_spatial copied to clipboard

st.read treats first row as data instead of header when you have just header in the sheet during excel import

Open sjoe77 opened this issue 1 year ago • 5 comments

When you have only one row in the Excel sheet which is the header and no data, st.read is treating it as data during import. In the attached image the first query shows correct behavior. This is when you have at least one row (See 1 below the header in the first field)

The second query in the screenshot show the issue, this is when the only row present is the header

image

sjoe77 avatar Dec 19 '23 00:12 sjoe77

Hi! Thanks for filing this issue! Unfortunately since this is related to the underlying GDAL driver I'm probably not going to prioritize fixing this. Feel free to leave the issue open though and I might take a shot at it in the future.

Maxxen avatar Dec 19 '23 12:12 Maxxen

Thank you Max. Until then I will try to put in some workaround. Hope this is fixed in the new year!

sjoe77 avatar Dec 19 '23 21:12 sjoe77

Hi guys, @Maxxen please is there anything new regarding this? @sjoe77 if you found some workaround, can you please share?

I searching for a way to load MS Excel data to DuckDB, always with headers. I have troubles with sheets that contain headers only. Thanks a lot!

petrparik avatar Jul 23 '24 07:07 petrparik

The workaround I used is to use SQL with explicit column names always(no select *). Example select col1, col2 ... So if you have no data just header this will throw an Exception that col1, col2 does not exist. I caught the exception and moved on assuming there were zero records in that file. But if the sql succeeded then it will give me records as expected when the data was present.

sjoe77 avatar Jul 24 '24 19:07 sjoe77

@sjoe77 cool, thanks! Unfortunately I cannot use that in my scenario (I don't have control over the SQL statement, it is issued by a user, as it is querying/testing tool). But anyway, thanks a lot for sharing your workaround.

petrparik avatar Jul 25 '24 07:07 petrparik

duplicate to #8531 #113 (which were closed for whatever reason)

This error is still reproducible with duckdb==1.1.3 and a simple excel sheet with two sheets. On the first sheet st_read correctly interprets the first row as column names, for the second one it ignores open_options=['HEADERS=FORCE'] and interprets all rows as data.

iligid avatar Nov 11 '24 11:11 iligid

Hello! Im not really planning to fix this anytime soon as Im working on adding excel import/export to DuckDB's excel extension at which point I'll most likely remove it from spatial.

Maxxen avatar Nov 11 '24 11:11 Maxxen

Hello!

As of DuckDB 1.2.0 the duckdb excel extension provides support for reading and writing xlsx files with greater efficiency and many more options. Therefore we're going to deprecate this feature in the spatial extension in the future. I would recommend that you try out the excel extension, and if you run into any problems please open an issue over at the duckdb-excel repository.

Maxxen avatar Feb 05 '25 21:02 Maxxen