OpenTESArena icon indicating copy to clipboard operation
OpenTESArena copied to clipboard

[Future Enhancement] Fan translations support (and improved .EXE unpacking)

Open LukasThyWalls opened this issue 2 years ago • 11 comments

Hello.

Ok, this is a low priority thing, but i want to share that exists fan translations, but at least at the moment and at least the Spanish translation doesn't work with OpenTESArena. When you launch it when a translated Arena game, you have this, and immediately closes (it's hard to pick that window).

opentesarena_errorTranslation

It works perfectly fine with the vanilla game in DOSBox (used with the GOG/CD version).

The Spanish fan translation can be downloaded here http://traducciones.clandlan.net/index.php?page=download&file=AS/TESArenav2.2.1.7z (Web in Spanish, to download, write the letters in the box in the center and push the button "¡DESCARGA!"), and look the differences that make works vanilla but not with OpenTESArena (although i suppose you will have an idea or two without seeing anything knowing how it works).

I write this only to point this out, to give the knowledge about these translations and helps to the devs make the right decisions beforehand to support them, to hope in the future they can be used with OpenTESArena too if the port takes more shape.

Thanks in advance.

LukasThyWalls avatar Apr 14 '22 21:04 LukasThyWalls

I haven't tested it but I imagine if you copy everything except A.EXE/ACD.EXE then it would work. That translation seems to have been made by unpacking the original executable, modifying strings in it, and repacking it. This would cause A.EXE/ACD.EXE to have differences in the binary data, and OpenTESArena's executable unpacker doesn't seem to be compatible.

It's good to know this is an issue. I think it mostly means that ExeUnpacker.cpp isn't flexible enough for what the PKLITE specification allows.

Additionally, there are files that come with OpenTESArena for reading data from A.EXE/ACD.EXE. These only work with specific versions of the game and are a bit labor-intensive to make, which means that fan modifications of those executables will need their own acdExeStrings.txt and/or aExeStrings.txt. It would be better if OpenTESArena had its own localization format with key-value pairs for every string in the game, but that is still too far in the future to talk about, I think.

afritz1 avatar Apr 15 '22 15:04 afritz1

Actually, the A.EXE and ACD.EXE don't seem to be compressed with PKLITE at all, they're just regular DOS executables. Not sure yet if this is something the engine could conditionally handle by looking at bytes in the executable, but it still would need custom aExeStrings.txt and acdExeStrings.txt.

afritz1 avatar Apr 15 '22 15:04 afritz1

I don't even know that the original Arena executables are in reality compressed files. Is interesting because maybe the translators somehow uncompressed the data to change it and recreate the exe to always work with the data uncompressed and make easy to change things.

The only thing i can say is the translation works fine, and it was in the works at least until 2013.

Like you explain, maybe OpenTESArena could expect compressed and uncompressed EXEs instead to expect only the original compressed versions. Maybe those lines can be translated in the port itself instead to be extracted from the EXE to override that issue. Or, seeing the aExeStrings.txt and acdExeStrings.txt, can be generated to match the translation and that's it, although i see strange that something like that can't be auto-generated, because these kinds of translations (translating directly the EXE) normally that text data are in a fixed position in the executable and using the same spacing, size and limiters than the original translation or they could not work at all. Maybe this one is different, idk.

Anyway, is something a bit far in the future, but it's something good to know, It's certain that there should be more translations and, in that case, knowing how they are can be helpful for the topic.

Thanks!

LukasThyWalls avatar Apr 15 '22 22:04 LukasThyWalls

Not sure yet if this is something the engine could conditionally handle by looking at bytes in the executable

The packed A.EXE from 1.06 and ACD.EXE from 1.07 both have

PKLITE Copr. 1990-91 PKWARE Inc. All Rights Reserved

near the top of the file, while the unpacked A.EXE does not have it (I haven't checked the unpacked ACD.EXE but I assume it would be the same). So maybe using that would work.

I don't even know that the original Arena executables are in reality compressed files. Is interesting because maybe the translators somehow uncompressed the data to change it and recreate the exe to always work with the data uncompressed and make easy to change things.

Looking at the A.EXE file in the disk directory from that download, and it looks like the translated text is at the same addresses in the file as in the unpacked English A.EXE (version 1.06), so I would guess they edited the text within the unpacked file while keeping the start of each string at the same location as in the original file.

There were also several bytes in the file outside of the translated text that differed, though.

it still would need custom aExeStrings.txt and acdExeStrings.txt.

Since the translated strings seem to all be at the same locations, wouldn't these files be fine as is? Is the problem maybe just that OpenTESArena expects A.EXE/ACD.EXE in its packed form and won't load an unpacked form?

Allofich avatar Apr 16 '22 06:04 Allofich

I tried packing the translated ACD.EXE from the download with PKLite v1.12 and running it with OpenTESArena, but it fails with Invalid last compressed word "0x74". It starts successfully from DOSBox-X.

Same thing happens with the translated A.EXE from the download. I packed it with PKLite v1.12 and it starts successfully in DOSBox-X. In OpenTESArena it fails with Invalid last compressed word "0x70".

Edit: Of course, to solve the issue of translation support, skipping the unpacking process for A.EXE/ACD.EXE when they don't need it may be enough. But I was curious if packing the translated files would cause them to work with OpenTESArena.

Allofich avatar Apr 16 '22 07:04 Allofich

Hmm. At https://github.com/afritz1/OpenTESArena/blob/main/docs/pklite_specification.md it says

If l is the length of the executable in bytes, then the compressed data is stored from byte at position 0x2F0 up until l - 8 within the executable. The compressed data should end with 0xFFFF.

and OpenTESArena checks for 0xFFFF, showing an "Invalid last compressed word" error if it isn't there. The original A.EXE has 0xFFFF in the right place. Here are the final 14 bytes of the file.

B2 6E B6 6E BA 6E C2 70 C6 70 FF FF 57 4A 80 00 00 00 00 00

But the translated A.EXE that I packed with PKLite 1.12 has these final 14 bytes.

6E B2 6E B6 6E BA 6E C2 70 C6 70 00 57 4A 7C 00 00 00 00 00

Where the original A.EXE has 0xFFFF, the one that was produced from running PKLite 1.12 on the translated A.EXE has 0x00. As I wrote above, this file does successfully start (testing with DOSBox-X).

So is this check for 0xFFFF incorrect or unnecessary?

Allofich avatar Apr 16 '22 08:04 Allofich

Another difference between what OpenTESArena assumes for the PKLite 1.12 specification and the files I got from packing the translated .exe files is the start of compressed data.

OpenTESArena assumes byte 0x2F0 (752). In the packed, translated files, the equivalent data (the values are a little different) appear to start at byte 0x300 (768).

English .EXE data starting at 0x2F0: 00 00 B5 8E 36 B8 3B C7 Equivalent Spanish .EXE data starting at 0x300: 00 00 BA 80 3B B4 30 CD

OpenTESArena still won't load the packed Spanish EXEs even if the check for 0xFFFF is removed and the start offset is set to 768, though.

Packed Spanish EXEs.zip

More information: Since I had only tested the packed Spanish EXEs as far as the title screen, I tried taking them in-game just to be sure they work, and they do appear to work properly.

Based on this site http://fileformats.archiveteam.org/wiki/PKLITE, in addition to 1.12, PKLite versions 1.05 and 1.13 should also fit the "1990-91" copyright seen in A.EXE and ACD.EXE. In case the discrepancies were from a different version of PKLite than 1.12 being used, I tried packing the Spanish EXEs with both of these versions as well, but I got the same differences from the original English .EXE files as I did with 1.12.

More information: Another difference: OpenTESArena gets the total decompressed file size with

		const uint16_t segment = Bytes::getLE16(compressedEnd);
		const uint16_t offset = Bytes::getLE16(compressedEnd + 2);
		return (segment * 16) + offset;

In the original packed English A.EXE, the segment is 0x4A75 and the offset is 0x0080. In the Spanish packed A.EXE, the the segment is 0x4A75 but the offset is 0x007C.

Allofich avatar Apr 16 '22 10:04 Allofich

According to https://www.fileformat.info/format/exe/corion-mz.htm the PKLite version is in the file header. Checking the original A.EXE file it shows itself as version 1.12. It also shows the "extra compression" flag set, which is "only available in PKLite Professional version".

From the PKLite 1.12 documentation:

     -e   Use Extra Compression Method

       (* Option available only in PKLITE Professional version *)

       This option is used to produce the smallest executable files.  It
       uses a slightly different algorithm, which also scrambles the
       excutable file.  This scrambling makes the executable data more
       resistant to disassembly or "reverse engineering" procedures.
       After a file is compressed using this method, it cannot be
       expanded to match the original executable file.  If you attempt
       to expand it using the -x option, PKLITE will return a message
       stating the file cannot be expanded.  This option is ideal for
       software developers who wish to distribute their programs in
       compressed form.

So maybe that's the reason for the discrepancies. Maybe the PKLite decompression used by OpenTESArena works specifically for extra-compressed EXEs.

Anyway sorry if this was not useful information or if you were already aware of all this.

Allofich avatar Apr 16 '22 12:04 Allofich

That appears to have indeed been the reason. When I packed the Spanish A.EXE with PKLite 1.12 Professional using the -e option, the discrepancies about 0xFFFF and starting offsets I mentioned above went away. You might want to amend https://github.com/afritz1/OpenTESArena/blob/main/docs/pklite_specification.md, which currently says "This specification should work with any executable compressed with PKLITE V1.12", to say that it only is for executables that were compressed using the extra compression option.

While OpenTESArena will get past the executable decompression step, it still won't start with the -e packed Spanish A.EXE or the -e packed Spanish ACD.EXE. In both cases it closes with

[Assets/BinaryAssetLibrary.cpp(346)] Initializing binary assets.
[Rendering/Renderer.cpp(75)] Closing.
[src/Main.cpp(25)] Error: Exception: invalid vector subscript

It seems that in ExeUnpacker::init, while in the while (true) loop it never reaches the encryptedByte == 0xFF break condition when running with a -e packed Spanish file (testing with ACD.EXE).

Extra compressed Spanish EXEs.zip

Allofich avatar Apr 16 '22 12:04 Allofich

With the -e packed Spanish files, encryptedByte == 0xE6 is reached for ACD.EXE and encryptedByte == 0xF4 is reached for A.EXE.

Changing line

else if (encryptedByte == 0xFF)

to

else if (encryptedByte == 0xFF || encryptedByte == 0xE6 || encryptedByte == 0xF4)

in ExeUnpacker.cpp allows the Spanish -e packed A.EXE and ACD.EXE, as well as the original English ones to all run in OpenTESArena. I've only done light testing but it seems to run without issue.

Perhaps "If the byte is 0xFF, then Duplication should be aborted, and the decompression process is finished." in https://github.com/afritz1/OpenTESArena/blob/main/docs/pklite_specification.md will also need to be amended, if these other values are also valid.

Allofich avatar Apr 16 '22 13:04 Allofich

Thanks for looking into this so much @Allofich! Sounds like ExeUnpacker.cpp has some decent room for improvement.

I still want to get through my rendering branch first but this is really useful information. I think the tl;dr is that ExeUnpacker.cpp just needs to be more data-driven, use more logic, and be less hardcoded to Arena's executables.

This part in the https://www.fileformat.info/format/exe/corion-mz.htm link seems useful:

---PKLITE compressed executable
OFFSET              Count TYPE   Description
001Ch                   1 byte   Minor version number
001Dh                   1 byte   Bit mapped :
								 0-3 - major version
								   4 - Extra compression
								   5 - Multi-segment file
001Eh                   6 char   ID='PKLITE'

afritz1 avatar Apr 18 '22 05:04 afritz1