First two columns wrong in old xls files.
I have some old xls files that I am parsing. In some of them, when parse them with python-calamine, have wrong values in the first two columns.
workbook = CalamineWorkbook.from_path(str(path))
for sheet_name in workbook.sheet_names:
rows_2d = workbook.get_sheet_by_name(sheet_name).to_python(
skip_empty_area=True,
)
print(rows_2d[0][0:3])
outputs
[0.001, 22.48, ''].
This is not the right content. If one looks at all fields, it shows that the values in the first two columns are always wrong. They should be mixed types, but calamine returns floats that look like they come from elsewhere in the file.
In comparison with xlrd:
book = xlrd.open_workbook(str(path))
for sh in book.sheets():
for rx in range(sh.nrows):
print(sh.row(rx)[0:3])
break
outputs
[text:'<correct text>', empty:'', empty:'']
which is correct (I replaced the string).
When I open the xls in Excel and save it again, this behavior goes away (they also become smaller for some reason). Therefore I can not "censor" it. Those are company files so I can not upload one here uncensored.
Is there something we can do about this?
Can you add a sample Excel workbook to reproduce?
As I said, these are company-internal files, and I have not found a way to redact them without the strange behavior going away. I will have to ask but I guess I would not be allowed to share them publicly. Is there a way to only share them with e.g. contributors?
I am not a contributor, but is your Excel file produced by an external program, like a "export to excel" button of an app?
Likely yes, but I am not entirely sure.
Is there a way to only share them with e.g. contributors?
Hi. You can send the file to me at [email protected]
I'm sorry, but I did not get permission to send you a file, not even to just you 😕. Do you see any way I can debug the issue at my computer?
I get the same result when using Polar's read_excel, so the issue is with calamine, right?
Hi. Yes, or with a file. Can you open the file with biff viewer (like https://www.aldeid.com/wiki/BiffView) and find what type of cells have wrong values?
Example for base.xls: