read-excel-file icon indicating copy to clipboard operation
read-excel-file copied to clipboard

(Chinese) A few illegal characters � show up when the file is large enough

Open duanyukai opened this issue 5 years ago • 6 comments

I found this strange bug, after parsing my excel file with large amount of Chinese characters, the output file contains very few amount of illegal utf8 characters ( shown as �) . I can only reproduce this bug when the file is large enough, the testcase file is below. I just copied the same line a lot.

测试.xlsx

image

duanyukai avatar Jan 13 '20 15:01 duanyukai

Hmm, no idea. I guess this issue should stay open so that other Chinese-speaking users could see it.

catamphetamine avatar Jan 13 '20 15:01 catamphetamine

I'll try to find some other "smaller" testcases, it seems like fault with buffer or something else?

duanyukai avatar Jan 13 '20 16:01 duanyukai

it seems like fault with buffer or something else?

Absolutely no idea. Sometimes I think that we should find an alternative simple Excel reading library and place the link in the readme: this library is intended for really simple cases, and people say it won't always work for large files.

catamphetamine avatar Jan 13 '20 16:01 catamphetamine

Encountered this also in Finnish words, where Näytä was converted into N��ytä. The latter ä is correct but the first one becomes two Unicode replacement characters U+FFFD.

This was triggered by modifying other cell values (the same value was read correctly previously). Adding any text in front of Näytä results in correct conversion, so this seems to require some very specific conditions to manifest.

plaa avatar Aug 26 '21 07:08 plaa

As for 'large enough', our file is 28kB (185 rows by 5 columns) which I consider to be pretty small.

plaa avatar Aug 26 '21 07:08 plaa

@plaa Attach the file illustrating the bug so that someone could potentially look at it

catamphetamine avatar Aug 26 '21 09:08 catamphetamine