FITD [Linux, VS Code] Fails to read ITD_RESS.PAK; `pakInfo.compressionFlag` out of bounds

Problem

On Linux, loadPak fails to correctly load ITD_RESS.PAK.

Specifically, pakInfo.compressionFlag is supposed to be either 0, 1, or 4; however, it's read as 52. It also appears that some other values might be wrong.

I'd imagine pakInfo.discSize should be smaller than pakInfo.uncompressedSize; it isn't
I'd also imagine pakInfo.offset isn't supposed to be negative; it is

How to replicate

Follow the instructions for building on Linux locally using Vs Code on my readme

Apr 18 '25 19:04 jmortiger

It's possible, though I haven't had the time to verify, that this might be the result of taking Windows/NTFS files and moving/copying them to a Linux/ex4 context (e.g. improperly handling & resolving NTFS compression), or it might be an endianness problem. If any other Linux users have the time to replicate & verify, that'd be great. If you don't have the game files, I'm not gonna give you the whole game, but you only need 2 files (really 1) to replicate this; you need a file named LISTBOD2.PAK for the engine to detect the game as AITD1 (it doesn't need to be the actual file, it just needs that name), and you need ITD_RESS.PAK to actually try and read data from. I'd be perfectly ok with giving anyone willing to try this my copy of ITD_RESS.PAK; just tell me your email and I'll send you the file. In fact, if anyone on Windows or macOS wants to try using my copy of ITD_RESS.PAK, that might help determine if it's the file that's bad, or the environment that's reading it; that would also be a big help. Windows would help determine if it's the file, macOS (being Unix derived) might help determine if it's the environment. If a Windows user who already has the whole game & can successfully build & run it wants to try, be sure to keep track of my file vs your OG working files; maybe keep a backup archive of your files before downloading mine.

Additionally, though I'm personally quite skeptical of this, it's also possible that this is because of my toolchain; I'm using Visual Studio Code w/ the CMake plugin & gcc as the C compiler & g++ as the C++ compiler. If anyone on Windows with VSC & the required Visual Studio compilation files wants to try building using VSC, that'd also be a big help & I'll update the readme in my PR; just be sure to clean the configuration files before building to make sure the files are being generated by the same toolchain. I might try clang locally as well to see if it works.

Apr 19 '25 13:04 jmortiger

If anyone would like to build for Windows & send me the compiled executable, I could test my local files and see if they work correctly that way as well.

Apr 19 '25 15:04 jmortiger

Ok, here's the troubleshooting I've done today:

Boot into Windows & compile & run the binary against the files; worked fine
Ensured the files do not have NTFS compression enabled; they didn't
Booted back into Linux & used the same validated files on the native Linux build; same error
Ran the compiled Windows binary through Wine with the same validated files; worked fine. At this point, I have to assume there's either a serious problem or it's a simple oversight, like 1 system defaulting to a different endianness.

Apr 20 '25 16:04 jmortiger

It appears to be the endianness. This resource says ITD_RESS.PAK has 19 subfiles in it; adding debug lines that change the endianness yield the following:

PAK_getNumFiles: Initial: ITD_RESS has 31586263236627 files (Raw value: 126345052946516)
	LE: ITD_RESS has 31586263236627 files (Raw value: 126345052946516)
	BE: ITD_RESS has 19 files (Raw value: 84)

Apr 20 '25 19:04 jmortiger

I've figured it out. All of the #ifdef MACOSX that are related to endianness need to be changed to #ifndef WIN32. Both macOS & Linux need the endianness changed; makes sense as they're both Unix based. I've gotten the 2D armadillo to load, but it now crashes on the title screen, I assume for the same reason.

Apr 20 '25 19:04 jmortiger

Endianness are nothing to do with them being Linux based, but of the CPU you are using. You can have a big or a little endian Linux. Although in this day and age, most common architectures are little endian. All this code was written in the age of PowerPC mac and the assumption that MACOSX == big endian is just plain wrong. I will do a pass on this quickly, but I don't really have anything big endian on hand where I can test that.

Apr 21 '25 09:04 yaz0r

I'm dualbooting Linux & Windows 10, and the same branches ran on both; after changing it so the same branches taken on macOS were taken on Linux, it's been fixed. I know it's supposed to be processor dependent, but platform does matter. I even tried running the binary I compiled on Windows (before my changes) on Linux with Wine; worked fine, on the same OS, with the same CPU. Here's the lscpu command's output on Linux:

Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian

Apr 21 '25 12:04 jmortiger

Whether it's a tooling problem or not, it is manually correctable in software. It's seemingly not consistent either; my fix isn't complete (sounds are wildly screwed up). I'm still trying to figure it all out, and I have to try rebuilding & launching on Windows to make sure I didn't break that.

Apr 21 '25 12:04 jmortiger

I've found some documentation supporting this

Linux appears to have changed endianness at some point (?) (third paragraph down)
Apparently, some operating systems - including Unix - allowed "the same code to be compiled for platforms with different internal representations". Additionally, the Unix C compiler ...stored 32-bit "double precision integer long" values with the 16-bit halves swapped from the expected little-endian order... which seems similar to a bug I'm running into, where it's not straight LE nor BE.

Apr 21 '25 16:04 jmortiger

The only other thing I can think of is that some/all of the program is 32 bit on the Windows workflow and 64 bit with my existing Linux workflow or vice versa. I'll see if I can test it, but I'm not super confident in that explanation.

Apr 21 '25 17:04 jmortiger

If you're dual booting the same computer on windows and linux, it will have the same endianness (likely little endian). Let me take a look at this when I have a minute. I know how to fix all this properly, I just need to find the time.

Apr 21 '25 22:04 yaz0r

It's little endian. There's no rush; I'm working on a diagnostic thing anyways, might help.

Apr 21 '25 22:04 jmortiger

I've made a branch whose only changes are the addition of numerous print statements showing the memory values being returned from the PAK files. Running this on Windows as a control and checking against other build target/platforms should make it easier to pinpoint the problem. I'll build & run on Windows tomorrow and generate the expected values tomorrow. The spreadsheets & info from this is also a decent check.

Apr 22 '25 00:04 jmortiger

Honestly, there is little point in supporting big endian architecture at this point. There is no mainstream big endian cpu architecture in this day an age. The issue here is more likely that compiled defines are messed up on linux and go down the old path that was intended for powerPC mac, and that was big endian (the last mainstream big endian architecture that is)

Apr 22 '25 19:04 yaz0r

That's the thing; it doesn't go down those path, but when I modified it to go down those paths, then it worked.

Apr 22 '25 19:04 jmortiger

I have the correct values of what should be being read from the working Windows build now, so I'll compare it to the garbage data the Linux build is reading; that'll show what needs to be done to convert to the proper resultant value. I can also cross-reference w/ the SDA guide from earlier; I've pulled some of it's references into here.

Apr 22 '25 19:04 jmortiger

The issue here is more likely that compiled defines are messed up on linux and go down the old path that was intended for powerPC mac, and that was big endian (the last mainstream big endian architecture that is)

I do think the definitions are incorrect, as cvars are initialized in main.cpp as such: CVars[i] = READ_BE_S16(&cvarValue);, with this definition:

FORCEINLINE u16 READ_BE_U16(void *ptr)
{
#ifdef MACOSX
  return *(u16*)ptr;
#else
  return (((u8*)ptr)[0]<<8)|((u8*)ptr)[1];
#endif
}

FORCEINLINE s16 READ_BE_S16(void *ptr)
{
  return (s16)READ_BE_U16(ptr);
}

Meaning Windows is viewed "big endian" and is correctly converted sometimes. It's a bit of a mess, which is why I'm focusing on the values themselves; what are they read as on both platforms, and what value is it resolved to when it works.

Apr 22 '25 20:04 jmortiger

This function is meant to say, read ushort big endian. Since x86 is little endian, it needs to be flipped. It's not the case on PPC (and back then Macos) so it doesn't need to be flipped. Those values are stored in big endian in the data files as I recall.

Apr 22 '25 20:04 yaz0r

Honestly, I'd say it'd be wise to have tests that automate retrieving pak data, interpreting sample data, etc. and comparing their output w/ known correct outputs; not sure how much of a hassle that would be, but I'll see if it's viable later.

Apr 22 '25 21:04 jmortiger

A further complication: on Windows, long int's are (seemingly) always 32 bits; they're 64 bits w/ GCC & Clang. Adding this

void detectGame(void)
{
#ifdef FITD_DEBUGGER
    printf("...BTW, here's the type sizes:\n\tu8: %zi, \n\tu16: %zi, \n\tu32: %zi, \n\ts8: %zi, \n\ts16: %zi, \n\ts32: %zi\n", sizeof(u8), sizeof(u16), sizeof(u32), sizeof(s8), sizeof(s16), sizeof(s32));
#endif

outputs this

Compiled the Apr 23 2025 at 09:19:27
...BTW, here's the type sizes:
        u8: 1, 
        u16: 2, 
        u32: 8, 
        s8: 1, 
        s16: 2, 
        s32: 8
Detected Alone in the Dark

config.h should have used

#if defined(HAS_STDINT) || AITD_UE4
typedef uint8_t u8;
typedef uint16_t u16;
typedef uint32_t u32;

typedef int8_t s8;
typedef int16_t s16;
typedef int32_t s32;
#else
typedef unsigned char u8;
typedef unsigned short int u16;
typedef unsigned int u32;

typedef signed char s8;
typedef signed short int s16;
typedef signed int s32;
#endif

instead of

#if defined(HAS_STDINT) || AITD_UE4
typedef uint8_t u8;
typedef uint16_t u16;
typedef uint32_t u32;

typedef int8_t s8;
typedef int16_t s16;
typedef int32_t s32;
#else
typedef unsigned char u8;
typedef unsigned short int u16;
typedef unsigned long int u32;

typedef signed char s8;
typedef signed short int s16;
typedef signed long int s32;
#endif

That being said, the first branch can be taken more than it is if this is added:

#ifdef __APPLE__
#include <TargetConditionals.h>
#include <stdint.h>
#define HAS_STDINT
#elif __has_include ("stdint.h")
#include <stdint.h>
#define HAS_STDINT
#endif

This helps explain why simply always using an endianness resolver doesn't work & why my debug output is weird on Linux; the alleged 32 bit type is 64 bits.

Apr 23 '25 14:04 jmortiger

Nevermind! It turns out that ~while my speakers indicate the whole reading garbage data thing might still a problem~ the incorrect sizing of types is the big problem. It's compiling after simply changing FitdLib/config.h to use non-long int's for u32 & s32 Edit: Looks like the sound data being read is correctly, the problem comes from something else.

Apr 23 '25 14:04 jmortiger

This should now be all fixed in master

Jun 16 '25 21:06 yaz0r