libmspack
libmspack copied to clipboard
Cannot extract files from a .cab file containing file names encoded in shift_JIS
If I try to extract files from a .cab file containing file names encoded in shift_JIS, it aborts with the following error:
Extracting cabinet: DENKEN CG集(同人).cab
extracting DENKEN?@CG?W?i???l?j/DENKEN?@CG?W.d88
DENKEN?@CG?W?i???l?j/DENKEN?@CG?W.d88: can't create file path
Using -f to try to workaround it also doesn't work.
The sample .cab file can be downloaded here.
Plenty of other Japanese sample .cab files can be obtained here.
Thanks for the source of Japanese cabinet files!
Unfortunately, I can't reproduce your problem.
Can you describe the system you're running cabextract on, and if you know, what type of filesystem you're writing to?
The error message "can't create file path" is caused by cabextract trying to create a directory called (per your output) DENKEN?@CG?W?i???l?j
, which your system rejects.
For comparison, it succeeds on Ubuntu / ext4, and Cygwin / NTFS gives exactly the same output:
$ cabextract DENKENБ@CGПWБiУпРlБj.cab
Extracting cabinet: DENKENБ@CGПWБiУпРlБj.cab
extracting DENKEN�@CG�W�i���l�j/DENKEN�@CG�W.d88
All done, no errors.
$ find DENKEN* -ls
35001622 4 drwxrwxr-x 2 kyz kyz 4096 Feb 18 11:17 DENKEN\201@CG\217W\201i\223\257\220l\201j
35001623 408 -rw-rw-r-- 1 kyz kyz 415824 Dec 19 1997 DENKEN\201@CG\217W\201i\223\257\220l\201j/DENKEN\201@CG\217W.d88
34083492 136 -rw-rw-r-- 1 kyz kyz 138062 Feb 18 11:17 DENKEN\320\221@CG\320\237W\320\221i\320\243\320\277\320\240l\320\221j.cab
You should also consider using the -e
encoding option so cabextract translates the filenames to UTF-8. If the filesystem you're using is OK with UTF-8 filenames, you'll get better results.
Again for comparison, with the -e shift_jis
option; Ubuntu is using glibc's iconv(), and Cygwin is using libiconv. They give identical output:
$ cabextract -e shift_jis DENKENБ@CGПWБiУпРlБj.cab
Extracting cabinet: DENKENБ@CGПWБiУпРlБj.cab
extracting DENKEN CG集(同人)¥DENKEN CG集.d88
All done, no errors.
$ find DENKEN* -ls
34083492 136 -rw-rw-r-- 1 kyz kyz 138062 Feb 18 11:21 DENKEN\320\221@CG\320\237W\320\221i\320\243\320\277\320\240l\320\221j.cab
34083493 408 -rw-rw-r-- 1 kyz kyz 415824 Dec 19 1997 DENKEN\343\200\200CG\351\233\206\357\274\210\345\220\214\344\272\272\357\274\211\302\245DENKEN\343\200\200CG\351\233\206.d88
As a side note, this does raise a separate concern with me; the encoding conversion also translated the file separators from \
to ¥
, so the result has no directory parts. It's a known issue with the character set, on Japanese computers actually using Shift_JIS, ¥
is a valid separator (because it's just how the font displays character code 0x5C, which is still the file separator character. Here it's translated it to the UTF-8 encoding of U+00A5 ¥ YEN SIGN so there are no separators. I'll have to think about what to do about this (if anything).