tarindexer icon indicating copy to clipboard operation
tarindexer copied to clipboard

Crash, most likely due to non-unicode characters in file name

Open dmitry-irtegov opened this issue 7 months ago • 0 comments

Hello! Thanks for useful idea!

I tried to use your program on the big archive while using an UTF-8 locale and it crashed with the stack trace: Traceback (most recent call last): File "tarindexer.py", line 123, in main()
File "tarindexer.py", line 118, in main indextar(dbtarfile,indexfile) File "tarindexer.py", line 66, in indextar outfile.write(rec) UnicodeEncodeError: 'utf-8' codec can't encode characters in position 40-47: surrogates not allowed The file name that most likely triggered the crash is \317\360\356\341\353\345\354\373\ \341\345\347\356\357\340\361\355\356\361\362\350\ \342\ \310\322.pdf (as output by ls -b), which indeed does not look like the valid UTF-8. Unfortunately I cannot send you the archive, mostly because the file and the surrounding files are rather big. While having this file in the archive is my fault, I think the program should avoid the crash, may be printing ls -b-style output instead.

dmitry-irtegov avatar Jul 24 '24 02:07 dmitry-irtegov