pycdlib icon indicating copy to clipboard operation
pycdlib copied to clipboard

os.walk fails on Shift-JIS encoded ISO-9660 filesystem

Open einstein95 opened this issue 1 year ago • 1 comments

Yeah, apparently (at least in 2003 and using Toast ISO 9660 Builder ("HAVE A NICE DAY")), you were able to make an ISO filesystem that wasn't ASCII. This Japanese clipart disc is what causes the problem: https://archive.org/download/GorippaPetit19/Gorippa%20Petit%2019.iso.

Current code fails with

  File "pycdlib/pycdlib.py", line 5932, in walk
    encoded = name.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8c in position 0: invalid start byte

My suggestion is to add an "encoding" keyword that can override the existing ones in https://github.com/clalancette/pycdlib/blob/e67d63512281b2966e2f8d5e2fa4f2a5f3579544/pycdlib/pycdlib.py#L5952-L5974

einstein95 avatar Sep 27 '22 11:09 einstein95

Yeah, I see what you mean.

Unfortunately, due to the way things are implemented, this is not going to be as easy as adding an encoding parameter. We use and store the strings internally to do all sort of things, like looking up the directory records, etc. Probably the right fix here is to store bytestrings internally, and only convert to/from the encoding on the user-facing APIs, but it is a big internal change to do that. I'll have to think about this further.

clalancette avatar Sep 28 '22 02:09 clalancette

Actually, I was totally wrong about this. You were right, we just needed an encoding argument in the walk API. I've added that (and a test) in 04812daf69c2453db06b5fefbb9cdf1f1fbb62d0 . So this should be fixed now. Thanks for the report!

clalancette avatar Feb 01 '23 02:02 clalancette