[Linux, VS Code] Fails to read ITD_RESS.PAK; `pakInfo.compressionFlag` out of bounds
Problem
On Linux, loadPak fails to correctly load ITD_RESS.PAK.
Specifically, pakInfo.compressionFlag is supposed to be either 0, 1, or 4; however, it's read as 52.
It also appears that some other values might be wrong.
- I'd imagine
pakInfo.discSizeshould be smaller thanpakInfo.uncompressedSize; it isn't - I'd also imagine
pakInfo.offsetisn't supposed to be negative; it is
How to replicate
Follow the instructions for building on Linux locally using Vs Code on my readme
It's possible, though I haven't had the time to verify, that this might be the result of taking Windows/NTFS files and moving/copying them to a Linux/ex4 context (e.g. improperly handling & resolving NTFS compression), or it might be an endianness problem.
If any other Linux users have the time to replicate & verify, that'd be great. If you don't have the game files, I'm not gonna give you the whole game, but you only need 2 files (really 1) to replicate this; you need a file named LISTBOD2.PAK for the engine to detect the game as AITD1 (it doesn't need to be the actual file, it just needs that name), and you need ITD_RESS.PAK to actually try and read data from. I'd be perfectly ok with giving anyone willing to try this my copy of ITD_RESS.PAK; just tell me your email and I'll send you the file. In fact, if anyone on Windows or macOS wants to try using my copy of ITD_RESS.PAK, that might help determine if it's the file that's bad, or the environment that's reading it; that would also be a big help. Windows would help determine if it's the file, macOS (being Unix derived) might help determine if it's the environment.
If a Windows user who already has the whole game & can successfully build & run it wants to try, be sure to keep track of my file vs your OG working files; maybe keep a backup archive of your files before downloading mine.
Additionally, though I'm personally quite skeptical of this, it's also possible that this is because of my toolchain; I'm using Visual Studio Code w/ the CMake plugin & gcc as the C compiler & g++ as the C++ compiler. If anyone on Windows with VSC & the required Visual Studio compilation files wants to try building using VSC, that'd also be a big help & I'll update the readme in my PR; just be sure to clean the configuration files before building to make sure the files are being generated by the same toolchain. I might try clang locally as well to see if it works.
If anyone would like to build for Windows & send me the compiled executable, I could test my local files and see if they work correctly that way as well.
Ok, here's the troubleshooting I've done today:
- Boot into Windows & compile & run the binary against the files; worked fine
- Ensured the files do not have NTFS compression enabled; they didn't
- Booted back into Linux & used the same validated files on the native Linux build; same error
- Ran the compiled Windows binary through Wine with the same validated files; worked fine. At this point, I have to assume there's either a serious problem or it's a simple oversight, like 1 system defaulting to a different endianness.
It appears to be the endianness. This resource says ITD_RESS.PAK has 19 subfiles in it; adding debug lines that change the endianness yield the following:
PAK_getNumFiles: Initial: ITD_RESS has 31586263236627 files (Raw value: 126345052946516)
LE: ITD_RESS has 31586263236627 files (Raw value: 126345052946516)
BE: ITD_RESS has 19 files (Raw value: 84)
I've figured it out. All of the #ifdef MACOSX that are related to endianness need to be changed to #ifndef WIN32. Both macOS & Linux need the endianness changed; makes sense as they're both Unix based. I've gotten the 2D armadillo to load, but it now crashes on the title screen, I assume for the same reason.
Endianness are nothing to do with them being Linux based, but of the CPU you are using. You can have a big or a little endian Linux. Although in this day and age, most common architectures are little endian. All this code was written in the age of PowerPC mac and the assumption that MACOSX == big endian is just plain wrong. I will do a pass on this quickly, but I don't really have anything big endian on hand where I can test that.
I'm dualbooting Linux & Windows 10, and the same branches ran on both; after changing it so the same branches taken on macOS were taken on Linux, it's been fixed. I know it's supposed to be processor dependent, but platform does matter. I even tried running the binary I compiled on Windows (before my changes) on Linux with Wine; worked fine, on the same OS, with the same CPU. Here's the lscpu command's output on Linux:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
Whether it's a tooling problem or not, it is manually correctable in software. It's seemingly not consistent either; my fix isn't complete (sounds are wildly screwed up). I'm still trying to figure it all out, and I have to try rebuilding & launching on Windows to make sure I didn't break that.
I've found some documentation supporting this
- Linux appears to have changed endianness at some point (?) (third paragraph down)
- Apparently, some operating systems - including Unix - allowed "the same code to be compiled for platforms with different internal representations". Additionally, the Unix C compiler
...stored 32-bit "double precision integer long" values with the 16-bit halves swapped from the expected little-endian order...which seems similar to a bug I'm running into, where it's not straight LE nor BE.
The only other thing I can think of is that some/all of the program is 32 bit on the Windows workflow and 64 bit with my existing Linux workflow or vice versa. I'll see if I can test it, but I'm not super confident in that explanation.
If you're dual booting the same computer on windows and linux, it will have the same endianness (likely little endian). Let me take a look at this when I have a minute. I know how to fix all this properly, I just need to find the time.
It's little endian. There's no rush; I'm working on a diagnostic thing anyways, might help.
I've made a branch whose only changes are the addition of numerous print statements showing the memory values being returned from the PAK files. Running this on Windows as a control and checking against other build target/platforms should make it easier to pinpoint the problem. I'll build & run on Windows tomorrow and generate the expected values tomorrow. The spreadsheets & info from this is also a decent check.
Honestly, there is little point in supporting big endian architecture at this point. There is no mainstream big endian cpu architecture in this day an age. The issue here is more likely that compiled defines are messed up on linux and go down the old path that was intended for powerPC mac, and that was big endian (the last mainstream big endian architecture that is)
That's the thing; it doesn't go down those path, but when I modified it to go down those paths, then it worked.
I have the correct values of what should be being read from the working Windows build now, so I'll compare it to the garbage data the Linux build is reading; that'll show what needs to be done to convert to the proper resultant value. I can also cross-reference w/ the SDA guide from earlier; I've pulled some of it's references into here.
The issue here is more likely that compiled defines are messed up on linux and go down the old path that was intended for powerPC mac, and that was big endian (the last mainstream big endian architecture that is)
I do think the definitions are incorrect, as cvars are initialized in main.cpp as such: CVars[i] = READ_BE_S16(&cvarValue);, with this definition:
FORCEINLINE u16 READ_BE_U16(void *ptr)
{
#ifdef MACOSX
return *(u16*)ptr;
#else
return (((u8*)ptr)[0]<<8)|((u8*)ptr)[1];
#endif
}
FORCEINLINE s16 READ_BE_S16(void *ptr)
{
return (s16)READ_BE_U16(ptr);
}
Meaning Windows is viewed "big endian" and is correctly converted sometimes. It's a bit of a mess, which is why I'm focusing on the values themselves; what are they read as on both platforms, and what value is it resolved to when it works.
This function is meant to say, read ushort big endian. Since x86 is little endian, it needs to be flipped. It's not the case on PPC (and back then Macos) so it doesn't need to be flipped. Those values are stored in big endian in the data files as I recall.
Honestly, I'd say it'd be wise to have tests that automate retrieving pak data, interpreting sample data, etc. and comparing their output w/ known correct outputs; not sure how much of a hassle that would be, but I'll see if it's viable later.
A further complication: on Windows, long int's are (seemingly) always 32 bits; they're 64 bits w/ GCC & Clang.
Adding this
void detectGame(void)
{
#ifdef FITD_DEBUGGER
printf("...BTW, here's the type sizes:\n\tu8: %zi, \n\tu16: %zi, \n\tu32: %zi, \n\ts8: %zi, \n\ts16: %zi, \n\ts32: %zi\n", sizeof(u8), sizeof(u16), sizeof(u32), sizeof(s8), sizeof(s16), sizeof(s32));
#endif
outputs this
Compiled the Apr 23 2025 at 09:19:27
...BTW, here's the type sizes:
u8: 1,
u16: 2,
u32: 8,
s8: 1,
s16: 2,
s32: 8
Detected Alone in the Dark
config.h should have used
#if defined(HAS_STDINT) || AITD_UE4
typedef uint8_t u8;
typedef uint16_t u16;
typedef uint32_t u32;
typedef int8_t s8;
typedef int16_t s16;
typedef int32_t s32;
#else
typedef unsigned char u8;
typedef unsigned short int u16;
typedef unsigned int u32;
typedef signed char s8;
typedef signed short int s16;
typedef signed int s32;
#endif
instead of
#if defined(HAS_STDINT) || AITD_UE4
typedef uint8_t u8;
typedef uint16_t u16;
typedef uint32_t u32;
typedef int8_t s8;
typedef int16_t s16;
typedef int32_t s32;
#else
typedef unsigned char u8;
typedef unsigned short int u16;
typedef unsigned long int u32;
typedef signed char s8;
typedef signed short int s16;
typedef signed long int s32;
#endif
That being said, the first branch can be taken more than it is if this is added:
#ifdef __APPLE__
#include <TargetConditionals.h>
#include <stdint.h>
#define HAS_STDINT
#elif __has_include ("stdint.h")
#include <stdint.h>
#define HAS_STDINT
#endif
This helps explain why simply always using an endianness resolver doesn't work & why my debug output is weird on Linux; the alleged 32 bit type is 64 bits.
Nevermind! It turns out that ~while my speakers indicate the whole reading garbage data thing might still a problem~ the incorrect sizing of types is the big problem. It's compiling after simply changing FitdLib/config.h to use non-long int's for u32 & s32
Edit: Looks like the sound data being read is correctly, the problem comes from something else.
This should now be all fixed in master