unshield
unshield copied to clipboard
How to extract files with non-English filenames
I'm extracting this old Korean game. It has only one file with the filename in Korean, and that file is either extracted with gibberish as filenames, or errors out.
No options or -R
results in:
extracting: ./App_Executables/custom/A_AúÄ¿½ºÅO½ºÅ²,,µå'A1y.txt
Extracting with -e
results in:
Could not encode text to 'EUC-KR' error Illegal byte sequence
Failed to extract file 'A_AúÄ¿½ºÅO½ºÅ²,,µå'A1y.txt'.Run unshield again with -D 3 for more information.
I've tried EUC-KR
, ISO-2022-KR
, and KSC_5601
. KSC_5601
didn't extract any of the files.
The correct filename is 유저커스텀스킨만드는법.txt
P.S. On a slightly unrelated note, is there a generic command to extract only the program files, skipping the IS internal files? As far as I understand the group(s) containing the program files vary between IS versions (or every single program).
@hgdagon I think you are using too modern charsets. It must be some old Microsoft Windows charset. Please try -e CP949
and report your results here!
As for your P.S. there is -g GROUP Only list/extract this file group
, if it helps? The help text is also prepared for -c COMPONENT Only list/extract this component
but I have never implemented that :(
CP949
and ISO-2022-KR
don't work either, same error. I thought I searched for all Korean encodings... Maybe there's any indication of the codepage in the setup itself, or some documentation on what encoding IS uses for Korean by default? I couldn't find any myself.
As far as extracting program files only. This setup contains all non-IS files in a group called App_Executables
, but I remember seeing installers that did not contain such group at all, so the group name is arbitrary, either in each version of IS, or every single program.
Your solution still requires manually taking a look at groups/components, while I'm looking for a more generic approach (that would work with any installer) to just skip IS's own internal files when extracting.
Anyway, the thing is I was hoping unshield to be this magical AIO solution for all the headache with old IS installers, thought I could just slap it into Inno Setup and have a generic installer for any old CD I come across. So, it's not as trivial as the encoding issue.
Thanks for the response!
Can you share data*.cab and data*.hdr with me somehow? For example Dropbox, Google drive, or a regular web link?
https://www.dropbox.com/s/3qq6v83c27alfz9/Disk1.7z?dl=1
Hi again, sorry for the delayed response!
For me it extracts fine with -e euc-kr
but when listing the file the name is not converted!
unshield -e euc-kr x data1.cab
Cabinet: data1.cab
...
extracting: ./App_Executables/custom/유저커스텀스킨만드는법.txt
...
-------- -------
67 files
unshield -e EUC-KR l data1.cab
Cabinet: data1.cab
...
978 App Executables\custom\����Ŀ���ҽ�Ų�����¹�.txt
-------- -------
67 files
So maybe I should actually use the -e
parameter when listing files too :)
What version of unshield and libiconv are you using as this command didn't work for you?
Now I see, you are missing this bug fix: https://github.com/twogood/unshield/pull/76/commits/592f1d62d97b48064509137b0dbc79241800d1d8
So this is a duplicate of #77 but I haven't made any new unshield release since then, sorry about that!
@hgdagon Please try unshield 1.4.3: https://github.com/twogood/unshield/releases/tag/1.4.3
Well, according to the timestamps, I cloned the repo on Sep 14, and:
local/mingw-w64-x86_64-libiconv 1.15-3
So, I pulled the update, recompiled it, and, I regret to say, it's still the same on my side, Illegal byte sequence
... Tried both static and shared builds.
I probably should've mentioned before that I'm building on Windows in msys2 (MinGW-w64), although I don't see why that should be an issue.
Thank you @hgdagon for your prompt reply. I guess I'll have to try this in a Windows environment then. I'm not sure if the -e parameter has ever been tested on Windows.
If you run the iconv command line tool in your Windows environment with the -l command line parameter, is EUC-KR included in the list?
iconv -l |grep -i EUC-KR
In the meantime, could you try to change the first parameter of the iconv_open call on line 762 in unshield.c from an empty string to UTF-8?
Current:
if ((encoding_descriptor = iconv_open("", encoding)) == (iconv_t)-1)
Modified:
if ((encoding_descriptor = iconv_open("UTF-8", encoding)) == (iconv_t)-1)
If you run the iconv command line tool in your Windows environment with the -l command line parameter, is EUC-KR included in the list?
iconv -l |grep -i EUC-KR
EUC-KR EUCKR CSEUCKR
In the meantime, could you try to change the first parameter of the iconv_open call on line 762 in unshield.c from an empty string to UTF-8?
That resulted in this:
extracting: ./App_Executables/custom/ìo ì ?ì»ìSí.?ìSí,"ëOë"oëS"ë².txt
That's what the output says, and the filename is slightly different:
ìœ ì €ì»¤ìŠ¤í…€ìŠ¤í‚¨ë§Œë“œëŠ”ë²•.txt
@hgdagon: What is your character set in the terminal? Like the contents of the LANG environment variable? For example:
$ echo $LANG
en_US.UTF-8
The character set in Msys2 shell is actually en_US.UTF-8
. But I'm running the executable in command prompt. I tried running in the shell now and I did see the correct filename.
extracting: ./App_Executables/custom/유저커스텀스킨만드는법.txt
But only in the shell output, the actual filename is still the same:
ìœ ì €ì»¤ìŠ¤í…€ìŠ¤í‚¨ë§Œë“œëŠ”ë²•.txt
What is your character set in Windows?
Um... standard English, I would assume... Whatever comes with Win10, I don't have any weird MUIs installed or anything.
I tried extracting on my Linux machine to see if the same errors occur.
Unshield 1.4.2 (current release in Manjaro repo) weirdly enough resulted in this error for every file:
Could not encode text to 'EUC-KR' error Argument list too long
So, instead I installed unshield-git form AUR (here's the PKGBUILD) and then it worked like a charm: no error and correct filename.
Considering this, I'm gonna try building in MSVC and comment back the results. I'm not entirely sure how to get libraries for MSVC, so it's gonna take some time. Will comment back sometime today or tomorrow.
Well, it's been a week, and, sadly, Visual Studio and I still don't speak the same language. Since the last time I tackled VS, apparently, there's this new thing called vcpkg, which I thought would bring some sense into the whole mess, but it just doesn't work. At least, I couldn't get it to work. Anyway, I can confirm that when built with msys2(MinGW-w64), encoding conversion doesn't work.
Thanks for your update. For me, I've still not tried it on MS Windows myself yet.