pycdlib
pycdlib copied to clipboard
os.walk fails on Shift-JIS encoded ISO-9660 filesystem
Yeah, apparently (at least in 2003 and using Toast ISO 9660 Builder ("HAVE A NICE DAY")), you were able to make an ISO filesystem that wasn't ASCII. This Japanese clipart disc is what causes the problem: https://archive.org/download/GorippaPetit19/Gorippa%20Petit%2019.iso.
Current code fails with
File "pycdlib/pycdlib.py", line 5932, in walk
encoded = name.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8c in position 0: invalid start byte
My suggestion is to add an "encoding" keyword that can override the existing ones in https://github.com/clalancette/pycdlib/blob/e67d63512281b2966e2f8d5e2fa4f2a5f3579544/pycdlib/pycdlib.py#L5952-L5974
Yeah, I see what you mean.
Unfortunately, due to the way things are implemented, this is not going to be as easy as adding an encoding parameter. We use and store the strings internally to do all sort of things, like looking up the directory records, etc. Probably the right fix here is to store bytestrings internally, and only convert to/from the encoding on the user-facing APIs, but it is a big internal change to do that. I'll have to think about this further.
Actually, I was totally wrong about this. You were right, we just needed an encoding
argument in the walk API. I've added that (and a test) in 04812daf69c2453db06b5fefbb9cdf1f1fbb62d0 . So this should be fixed now. Thanks for the report!