python-calamine icon indicating copy to clipboard operation
python-calamine copied to clipboard

First two columns wrong in old xls files.

Open marvinbernhardt opened this issue 3 months ago • 8 comments

I have some old xls files that I am parsing. In some of them, when parse them with python-calamine, have wrong values in the first two columns.

workbook = CalamineWorkbook.from_path(str(path))
for sheet_name in workbook.sheet_names:
    rows_2d = workbook.get_sheet_by_name(sheet_name).to_python(
        skip_empty_area=True,
    )
    print(rows_2d[0][0:3])

outputs [0.001, 22.48, '']. This is not the right content. If one looks at all fields, it shows that the values in the first two columns are always wrong. They should be mixed types, but calamine returns floats that look like they come from elsewhere in the file.

In comparison with xlrd:

book = xlrd.open_workbook(str(path))
for sh in book.sheets():
    for rx in range(sh.nrows):
        print(sh.row(rx)[0:3])
        break

outputs [text:'<correct text>', empty:'', empty:''] which is correct (I replaced the string).

When I open the xls in Excel and save it again, this behavior goes away (they also become smaller for some reason). Therefore I can not "censor" it. Those are company files so I can not upload one here uncensored.

Is there something we can do about this?

marvinbernhardt avatar Oct 10 '25 07:10 marvinbernhardt

Can you add a sample Excel workbook to reproduce?

fzumstein avatar Oct 20 '25 13:10 fzumstein

As I said, these are company-internal files, and I have not found a way to redact them without the strange behavior going away. I will have to ask but I guess I would not be allowed to share them publicly. Is there a way to only share them with e.g. contributors?

marvinbernhardt avatar Oct 29 '25 07:10 marvinbernhardt

I am not a contributor, but is your Excel file produced by an external program, like a "export to excel" button of an app?

fzumstein avatar Oct 29 '25 08:10 fzumstein

Likely yes, but I am not entirely sure.

marvinbernhardt avatar Oct 29 '25 08:10 marvinbernhardt

Is there a way to only share them with e.g. contributors?

Hi. You can send the file to me at [email protected]

dimastbk avatar Oct 31 '25 05:10 dimastbk

I'm sorry, but I did not get permission to send you a file, not even to just you 😕. Do you see any way I can debug the issue at my computer?

marvinbernhardt avatar Nov 10 '25 09:11 marvinbernhardt

I get the same result when using Polar's read_excel, so the issue is with calamine, right?

marvinbernhardt avatar Nov 13 '25 17:11 marvinbernhardt

Hi. Yes, or with a file. Can you open the file with biff viewer (like https://www.aldeid.com/wiki/BiffView) and find what type of cells have wrong values?

Example for base.xls:

Image

dimastbk avatar Nov 13 '25 17:11 dimastbk