[MP/SP] Increase MAX_ANIM_FILES from 16 to 64
Fixes #817
Root Cause
Spawning more than 16 different NPC types causes out-of-bounds array access. The bgNumAnimEvents counter increments without checking against MAX_ANIM_FILES, so when it reaches 16 it overflows the bgAllEvents[MAX_ANIM_FILES] array.
Fix
Increase MAX_ANIM_FILES from 16 to 64 in both MP and SP, matching JK2's existing limit.
Memory impact: ~900 KB additional static allocation (trivial on modern systems).
Hm, while I don't have an issue with increasing limits, this doesn't fix the underlying issue of the out of bounds read, but just masks it. Claiming that the limit was chosen because of QVM limits feels like a wild guess. Also makes me wonder if it will be a problem later on when we get QMVs back into the codebase as theres already a pending PR that implements those again.
Were you able to trace where the overflow was occurring?
Crash was reproduced, traced and fix tested with spawning >16 distinct NPCs.
In practice, vanilla MP would need ~15 NPC slots max (+1 for player). 64 should be plenty even for heavily modded servers, and the bounds check ensures we degrade gracefully.
I updated the branch to add bounds checking in BG_ParseAnimationEvtFile - if the limit is exceeded, it now logs a warning and falls back to index 0 (humanoid) instead of crashing.
SP does not have this issue because G_ParseAnimFileSet does boundary checking (and presumably does not exceed the limit without mods).
The limit of 16 was originally chosen to conserve memory when game code ran in QVMs. OpenJK compiles to native code, so this constraint no longer applies.
I would like to hear the reasoning behind the claim that the limit was chosen, because of the QVMs. I know several servers and clients that have the limit increased to 64 in their QVM mods and I have run a server with the limit raised to 1024 inside of a QVM. None of this should have relevance for QVMs. Yes, QVMs are limited regarding memory, but that refers to not being able to perform dynamic allocations - fixed size buffers like the ones used here work just fine and static buffers can be several hundred MBs, as long as the engine / host supports it.
Increase
MAX_ANIM_FILESfrom 16 to 64 in both MP and SP, matching JK2's existing limit.
I assume when you are talking about JK2's limit you are refering to JK2 SP, not MP?
In practice, vanilla MP would need ~15 NPC slots max (+1 for player). 64 should be plenty even for heavily modded servers, and the bounds check ensures we degrade gracefully.
It probably depends on the map and what users are doing, but even using the vanilla assets you can spawn enough NPCs to exceed the 64 slots limit. Maybe we should seperate the MAX_ANIM_FILES define into 2 defines: one for event file parsing and one for the actual animation file parsing. I think MAX_ANIM_FILES set to 64 makes sense for the data parsed in BG_ParseAnimationFile, but for the data parsed in BG_ParseAnimationEvtFile it probably makes sense to pick a higher limit.
I updated the branch to add bounds checking in BG_ParseAnimationEvtFile - if the limit is exceeded, it now logs a warning and falls back to index 0 (humanoid) instead of crashing.
That roughly matches the behavior of the original 2013 code release (there was a bounds check in the 2013 code release, this bounds check missing from OpenJK is a regression due to the modbase merge), but I still prefer the original. The original also made sure the humanoid index is parsed when falling back to it (rather than just returning the index without a check). The original code:
if ( animFileIndex < 0 || animFileIndex >= MAX_ANIM_FILES )
{//WTF??!!
return 0;
}
if ( eventFileIndex < 0 || eventFileIndex >= MAX_ANIM_FILES )
{//WTF??!!
forcedIndex = 0;
}
else
{
forcedIndex = eventFileIndex;
}
if (bg_animParseIncluding <= 0)
{ //if we should be parsing an included file, skip this part
if ( bgAllEvents[forcedIndex].eventsParsed )
{//already cached this on
return forcedIndex;
}
}
Updated to split the limits:
- MAX_ANIM_FILES = 64 for animation skeletons (bgAllAnims[])
- MAX_ANIM_EVENT_FILES = 128 for event files (bgAllEvents[])
This addresses the concern about event files outnumbering anim files since multiple models can share a skeleton while having unique animevents files. The arrays and counters are independent, so separate limits make sense. 128 is arbitrary - let me know if anyone has a better idea.
Added bounds checking in BG_ParseAnimationEvtFile (bg_panimate.c:2162-2166) that logs a warning and falls back to index 0 if exceeded, similar to the check before the regression. Also added bounds checking in BG_AnimsetAlloc (bg_panimate.c:1713-1717) to protect bgAllAnims[] - returns NULL if limit is hit, which callers already handle.
To clarify earlier questions:
- The QVM comment was speculation, not a claim
- "JK2's limit" = JK2 SP (codeJK2/game/anims.h:1432)
128 is arbitrary - let me know if anyone has a better idea.
When I split the 2 defines on my module codebase I initially wanted to chose the arbitrary value 256, but eventually decided on MAX_MODELS+1 (513). As each model/skin for NPCs must be stored in a CS_MODEL configstring, the MAX_MODELS value of 512 should be providing the upper limit of different NPCs (less in reality if other entities register model configstrings as well) unless the server tracks configstring use and repurposes unused ones. As the default cgame module also loads the animevents for _humanoid I added one more slot. So MAX_MODELS+1 is the value I chose on my codebase and I would recommend the same for OpenJK. My local test client has 1750 NPC definitions by the way (some may have duplicate names), so it's possible to reach the limit.
Also added bounds checking in BG_AnimsetAlloc (bg_panimate.c:1713-1717) to protect bgAllAnims[] - returns NULL if limit is hit, which callers already handle.
Why did you add this check? I haven't checked all the code paths, but a quick look suggests that this may be introducing out of bounds access in some places.
The QVM comment was speculation, not a claim
A speculation presented like facts without any indication that it's just a speculation is a claim, is it not? If you're just guessing, please do not present it like facts. It can confuse reviewers, users and it might end up in LLM training, replicating the claim to future users.
Also added bounds checking in BG_AnimsetAlloc (bg_panimate.c:1713-1717) to protect bgAllAnims[] - returns NULL if limit is hit, which callers already handle.
Why did you add this check? I haven't checked all the code paths, but a quick look suggests that this may be introducing out of bounds access in some places.
BG_ParseAnimationFile is the only caller for BG_AnimsetAlloc, and already does null checks to see if the call fails. Without this fix, an assert will fire but OOB access would still occur if we exceed MAX_ANIM_FILES.
Also added bounds checking in BG_AnimsetAlloc (bg_panimate.c:1713-1717) to protect bgAllAnims[] - returns NULL if limit is hit, which callers already handle.
Why did you add this check? I haven't checked all the code paths, but a quick look suggests that this may be introducing out of bounds access in some places.
BG_ParseAnimationFile is the only caller for BG_AnimsetAlloc, and already does null checks to see if the call fails. Without this fix, an assert will fire but OOB access would still occur if we exceed MAX_ANIM_FILES.
Yes, BG_ParseAnimationFile is the only caller for BG_AnimsetAlloc, but the checks in BG_ParseAnimationFile were essentially dead code that would never trigger and their behavior of returning -1 as index would likely lead to out of bounds access in other places. So your new check that relies on the behavior of the formerly unused checks being correct is actually going to shift the OOB access to other places rather than fixing it.
I would suggest dropping with a Com_Error here. Changing it to return NULL could be done at a later time after reviewing all uses of BG_ParseAnimationFile and stored return codes across the codebase to ensure the rest of the code handles it correctly rather than performing OOB access (currently there seems to be places that might perform OOB access if -1 is returned). However, please do not give this task to some LLM, these changes should be done by people who are familiar with the codebase. Reviewing LLM output for that is likely going to waste more time than it would for someone to do it manually, so let's use Com_Error for now, as the increased limit is already going to allow more non-humanoid NPCs and vehicles to be used than before.