Descent3
Descent3 copied to clipboard
[Runtime Bug/Crash] Linux: audio related crash when trying to play a mission
Build Version
1fd8876f60b06abcb0520d9f3bfff36543bfc3a8
Operating System Environment
- [ ] Microsoft Windows (32-bit)
- [ ] Microsoft Windows (64-bit)
- [ ] Mac OS X
- [X] Linux: Fedora 39
CPU Environment
- [ ] x86 (32-bit Intel/AMD)
- [X] x86_64 (64-bit Intel/AMD)
- [ ] ARM (32-bit)
- [ ] ARM64 (64-bit; sometimes called AArch64)
- [ ] Other (RISC V, PPC...)
Game Environment
Start a new campaign on any level for Descent 3: Retribution
Description
After starting a mission, you wait at the loading screen, then the game crashes
Regression Status
Present since first commit
Steps to Reproduce
Just start playing any level from Descent 3: Retribution, then it crashes on the loading screen before the actual gameplay starts.
- Game modes affected:
- [X] Single player
- [ ] Multiplayer competitive
- [ ] Anarchy
- [ ] Hyper-Anarchy
- [ ] Robo-Anarchy
- [ ] Team Anarchy
- [ ] Capture the Flag
- [ ] Bounty
- [ ] Entropy
- [ ] Hoard
- [ ] Monsterball
- [ ] Multiplayer cooperative
Here are more details provided by GDB:
Thread 1 "Descent3" received signal SIGSEGV, Segmentation fault. 0x000000000067747e in hlsSystem::ComputePlayInfo (this=0x1339f20 <Sound_system>, sound_obj_index=1, virtual_pos=0x7fffffffbd70, virtual_vel=0x7fffffffbd64, adjusted_volume=0x7fffffffbd60) at /home/bperris/base/dev/Descent3/sndlib/hlsoundlib.cpp:925 925 sound_seg = BOA_INDEX(sound_seg); Missing separate debuginfos, use: dnf debuginfo-install xorg-x11-drv-nvidia-libs-550.67-1.fc39.x86_64
@DanielGibson @kevinbentley
can you also get a backtrace from gdb? (enter bt
in the gdb commandline after the crash)
#0 0x000000000067747e in hlsSystem::ComputePlayInfo (this=0x1339f20 <Sound_system>, sound_obj_index=2, virtual_pos=0x7fffffffbd70, virtual_vel=0x7fffffffbd64,
adjusted_volume=0x7fffffffbd60) at /home/bperris/base/dev/Descent3/sndlib/hlsoundlib.cpp:925
#1 0x0000000000677f1d in hlsSystem::Emulate3dSound (this=0x1339f20 <Sound_system>, sound_obj_index=2)
at /home/bperris/base/dev/Descent3/sndlib/hlsoundlib.cpp:1024
#2 0x00000000006789b8 in hlsSystem::Play3dSound (this=0x1339f20 <Sound_system>, sound_index=126, cur_pos=0x7fffffffbe40, cur_obj=0x8fe8b0 <Objects+39312>,
priority=1, volume=1, flags=0, offset=0) at /home/bperris/base/dev/Descent3/sndlib/hlsoundlib.cpp:1157
#3 0x0000000000678269 in hlsSystem::Play3dSound (this=0x1339f20 <Sound_system>, sound_index=126, priority=1, cur_obj=0x8fe8b0 <Objects+39312>, volume=1, flags=0,
offset=0) at /home/bperris/base/dev/Descent3/sndlib/hlsoundlib.cpp:1065
#4 0x0000000000412429 in AIInit (obj=0x8fe8b0 <Objects+39312>, ai_class=0 '\000', ai_type=2 '\002', ai_movement=1 '\001')
at /home/bperris/base/dev/Descent3/Descent3/AImain.cpp:3741
#5 0x0000000000545c7d in ObjInitGeneric (objp=0x8fe8b0 <Objects+39312>, reinit=false) at /home/bperris/base/dev/Descent3/Descent3/ObjInit.cpp:810
#6 0x0000000000546b97 in ObjInitTypeSpecific (objp=0x8fe8b0 <Objects+39312>, reinitializing=false) at /home/bperris/base/dev/Descent3/Descent3/ObjInit.cpp:1117
#7 0x0000000000546ed3 in ObjInit (objp=0x8fe8b0 <Objects+39312>, type=2, id=102, handle=2087, pos=0x7fffffffc034, creation_time=0, parent_handle=-1)
at /home/bperris/base/dev/Descent3/Descent3/ObjInit.cpp:1206
#8 0x00000000004c21c6 in ReadObject (ifile=0x27de7e0, objp=0x8fe8b0 <Objects+39312>, handle=2087, fileversion=127)
at /home/bperris/base/dev/Descent3/Descent3/LoadLevel.cpp:1667
#9 0x00000000004c9b58 in LoadLevel (filename=0x169ad50 "Level2.d3l", cb_fn=0x0) at /home/bperris/base/dev/Descent3/Descent3/LoadLevel.cpp:3816
#10 0x00000000004dbd6f in LoadMissionLevel (level=2) at /home/bperris/base/dev/Descent3/Descent3/Mission.cpp:1296
#11 0x000000000049201d in LoadAndStartCurrentLevel () at /home/bperris/base/dev/Descent3/Descent3/gamesequence.cpp:1706
#12 0x0000000000491011 in GameSequencer () at /home/bperris/base/dev/Descent3/Descent3/gamesequence.cpp:1199
#13 0x0000000000472187 in PlayGame () at /home/bperris/base/dev/Descent3/Descent3/game.cpp:832
#14 0x0000000000464a1c in MainLoop () at /home/bperris/base/dev/Descent3/Descent3/descent.cpp:621
#15 0x00000000004648b8 in Descent3 () at /home/bperris/base/dev/Descent3/Descent3/descent.cpp:571
#16 0x00000000005e6df1 in oeD3LnxApp::run (this=0x7fffffffd4a0) at /home/bperris/base/dev/Descent3/Descent3/lnxmain.cpp:255
#17 0x00000000005e6d25 in main (argc=1, argv=0x7fffffffd6a8) at /home/bperris/base/dev/Descent3/Descent3/lnxmain.cpp:755
Same as #7, corruptions on loading level.
Same as #7, corruptions on loading level.
That could be the reason I couldn't send out the guidebot or open a door on level 2
This is also present on macOS targeting arm64.
Only Windows is being forced to build in 32bit. For Linux, at least, Currently CI builds 64-bit executables and d3-linux.hog. In order to build 32-bit you need add -DBITS=-m32 to cmake configure command.
. I'm not certain if it's similar for mac. I suggest trying a 32bit build and see if you get the same issue.
One problem with trying to get 32-bit binaries working on a modern version of macOS: It doesn't support 32-bit binaries, either AARCH32 or i386. Support for 32-bit binaries was dropped in 10.15, and Apple Silicon Macs were released with 11.0.
The problem here seems to be that in hlsSystem::ComputePlayInfo()
(where the crash happens), sound_seg
is -1
, which then causes the crash in sound_seg = BOA_INDEX(sound_seg);
BOA_INDEX() is the following macro:
#define BOA_INDEX(x) ((ROOMNUM_OUTSIDE(x) ? (TERRAIN_REGION(x) + Highest_room_index + 1) : x))
in the end, BOA_INDEX(sound_seg)
expands to:
(((((sound_seg) & 0x80000000) != 0) ? (((Terrain_seg[0x7FFFFFFF & sound_seg].flags & (32 + 64 + 128)) >> 5) + Highest_room_index + 1) : sound_seg))
0x7FFFFFFF & -1
should be 0x7FFFFFFF
(2'147'483'647), which is a way too high index for terrain_segment Terrain_seg[TERRAIN_WIDTH * TERRAIN_DEPTH]
= Terrain_seg[256 * 256]
= Terrain_seg[65536]
I just provoked the crash with starting a new game, specifically the "Pilot Training" mission.
In my case, sound_obj_index
was 0
, m_sound_objects[sound_obj_index].m_link_info.object_handle
was 8200
and the object
returned by ObjGet(m_sound_objects[sound_obj_index].m_link_info.object_handle);
had pos = { 2059.34961, -723.258789, 2469.10718 }
and roomnum = -1
, which is where sound_seg = -1
came from.
No idea why roomnum
is -1
.
Adding
if(sound_seg == -1)
return false;
right before the sound_seg = BOA_INDEX(sound_seg);
line prevents the crash, but I have no idea if this is the correct fix, or if the real bug is much earlier when -1
set as the object's roomnum
, wherever that happens, and should be fixed there
The problem here seems to be that in
hlsSystem::ComputePlayInfo()
(where the crash happens),sound_seg
is-1
, which then causes the crash insound_seg = BOA_INDEX(sound_seg);
BOA_INDEX() is the following macro:
#define BOA_INDEX(x) ((ROOMNUM_OUTSIDE(x) ? (TERRAIN_REGION(x) + Highest_room_index + 1) : x))
in the end,
BOA_INDEX(sound_seg)
expands to:(((((sound_seg) & 0x80000000) != 0) ? (((Terrain_seg[0x7FFFFFFF & sound_seg].flags & (32 + 64 + 128)) >> 5) + Highest_room_index + 1) : sound_seg))
0x7FFFFFFF & -1
should be0x7FFFFFFF
(2'147'483'647), which is a way too high index forterrain_segment Terrain_seg[TERRAIN_WIDTH * TERRAIN_DEPTH]
=Terrain_seg[256 * 256]
=Terrain_seg[65536]
I just provoked the crash with starting a new game, specifically the "Pilot Training" mission. In my case,
sound_obj_index
was0
,m_sound_objects[sound_obj_index].m_link_info.object_handle
was8200
and theobject
returned byObjGet(m_sound_objects[sound_obj_index].m_link_info.object_handle);
hadpos = { 2059.34961, -723.258789, 2469.10718 }
androomnum = -1
, which is wheresound_seg = -1
came from.No idea why
roomnum
is-1
.Adding
if(sound_seg == -1) return false;
right before the
sound_seg = BOA_INDEX(sound_seg);
line prevents the crash, but I have no idea if this is the correct fix, or if the real bug is much earlier when-1
set as the object'sroomnum
, wherever that happens, and should be fixed there
I was looking into this earlier tonight, seems I was on the right track after all. It gets initialized at https://github.com/DescentDevelopers/Descent3/blob/86a6c139ee9b71ffa2a043a1778765367ee0652e/Descent3/ObjInit.cpp#L1187
Along with that are a few other inits with val -1
, maybe they need to be changed as well.
But probably it should be set to a proper value elsewhere?
Otherwise, why does it not crash on Win32? The calculations in the macro should be the same, so the index would be far behind the end of the array..
On the other hand, on 32bit Terrain_seg[0x7FFFFFFF]
would overflow the address space - sizeof(terrain_segment)
is 20 so that would result in the address (char*)&Terrain_seg[0] + 0x7FFFFFFF * 20
; that multiplication should overflow and be truncated to 0xFFFFFFEC
(if I'm not mistaken), so we'd basically get (char*)&Terrain_seg[0] + 0xFFFFFFEC
. That addition would overflow again, and we'd probably end up 20 bytes before the start of the Terrain_seg
array - with some luck at least some valid data is there (possibly VisibleTerrainZ
or some other global variable), so accessing it doesn't outright crash.
On 64bit, neither the multiplication nor the addition will overflow, because the address space is big enough - so we'll very likely end up at an address that's currently not valid.
Either way, the solution definitely is not to initialize roomnum
to a different value in the constructor or something.
Either the code only worked by accident on Win32 (by overflowing to an address that's valid) and adding if(sound_seg == -1) return false;
is the correct solution, or roomnum should've been set to another value while loading the level or something, and for some reason wasn't.
Well, it's read in at https://github.com/DescentDevelopers/Descent3/blob/86a6c139ee9b71ffa2a043a1778765367ee0652e/Descent3/LoadLevel.cpp#L1671 where objp->roomnum = roomnum
.
Yeah, but is that done for all objects, or is this specific object left out for some reason?
Or did cf_ReadInt(ifile);
return -1
?
By the way, I wrote a hacky testprogram (that's not really valid C, the compiler might optimize void* p2 = &blas[0x7FFFFFFF];
to whatever because it's clearly not within the array, but at least when building without optimizations it worked on my PC so it's good enough for demonstration):
#include <stdio.h>
struct bla { char data[20]; };
struct bla blas[256*256];
int main()
{
void* p = &blas[0];
void* p2 = &blas[0x7FFFFFFF];
printf("p = %p p2 = %p diff = %td\n", p, p2, p2-p);
return 0;
}
When compiled for 32bit (gcc -o test -m32 test.c
), it prints something like:
p = 0x5947f040 p2 = 0x5947f02c diff = -20
for 64bit (gcc -o test test.c
) it prints:
p = 0x6151a5ee7040 p2 = 0x615ba5ee702c diff = 42949672940
So my theory that on 32bit platforms one lands 20 bytes before the array seems to hold (didn't test this on Windows with MSVC though - I would expect it to behave the same, but I always forget if some platforms treat pointers as signed or some stupid shit like that)
Created a pull request with the fix mentioned above: #158 Needs reviewing from someone who actually understands how the Descent3 source is supposed to work (i.e. @kevinbentley).
I did some digging around and we are loading objects with negative room numbers.
From the training mission:
Loaded Object 20 type 6 in room -2147483648
(type 6 is a OBJ_VIEWER
)
Retribution level 1 has lots of them:
Loaded Object 2 type 7 in room -2147442039
Loaded Object 16 type 7 in room -2147441782
Loaded Object 12 type 7 in room -2147441526
Loaded Object 12 type 7 in room -2147439982
(type 7 is a OBJ_POWERUP
)
Use this patch along with #157 (once it lands) to see these outputs:
index 6434fbd..31361b7 100644
--- a/Descent3/LoadLevel.cpp
+++ b/Descent3/LoadLevel.cpp
@@ -1,5 +1,5 @@
/*
-* Descent 3
+* Descent 3
* Copyright (C) 2024 Parallax Software
*
* This program is free software: you can redistribute it and/or modify
@@ -1572,7 +1572,7 @@ void ConvertObject(int *type, int *id) {
// returns 1 if read ok
int ReadObject(CFILE *ifile, object *objp, int handle, int fileversion) {
int type, id, old_id, i;
- int roomnum;
+ int roomnum = 0;
float door_shields;
char tempname[OBJ_NAME_LEN + 1] = "";
@@ -1650,6 +1650,7 @@ int ReadObject(CFILE *ifile, object *objp, int handle, int fileversion) {
// Get the room number
roomnum = cf_ReadInt(ifile);
+ mprintf((0, "Loaded Object %d type %d in room %d\n", id, type, roomnum));
// For old files, check if this object is on terrain
if (fileversion < 49) {
I think negative room numbers are ok, that encodes if it's an (indoor) room or an (outdoor) cell, as far as I can tell, see ROOMNUM_OUTSIDE(), ROOMNUM_CELLNUM_FLAG (which is 0x80000000 = -2147483648 aka INT32_MIN) etc in Descent3/object_external.h
But -1 specifically apparently means "uninitialized", and -1 & ROOMNUM_CELLNUM_MASK
= -1 & 0x7fffffff
= 0x7fffffff
= 2147483647
, which is the highest possible (signed 32bit) index.
On the other hand, if for example you have room -2147442039
, CELLNUM(-2147442039)
= -2147442039 & ROOMNUM_CELLNUM_MASK
= -2147442039 & 0x7fffffff
= 41609
which is a reasonably small number (remember, we have 256*256 = 65536 terrain segments, which I guess are cells)
This is the problem I was investigating last weekend, and yes inside rooms are positive, and negative are outside except for the case of -1 which means the object has no room assigned, unlinked. I couldn't figure out why in a 64-bit build, unlinked objects are being passed around, so all I could do it make the sanity check to prevent crash when unlinking an object that has already been unlinked. I cannot also play level 2, as the first door just doesn't open for me, it is a sign things are not properly setup. This happens when loading a demo file, and I know the demo file hasn't changed in a long time, so it confirms this strange bug causes chain of strange things to happen.
https://pastebin.com/p8qVPSJ9
Seems we could do with both a sanity check for unlinked objects and some additional logging for objects as they're loaded, which will help us diagnose problems with incorrect room assignments.
Doing some more digging and the issue is that ObjInit
(eventually) calls hlsSystem::Play3dSound
(AImain.cpp:3741 in AIInit
) but the roomnum
isn't set until after ObjInit
is called because all of the init code assumes roomnum
is -1 during init. Moving the code to set roomnum
earlier causes a whole other set of problems.
Commenting out the Play3dSound
call seems to fix the problem but I don't know why it's there.
bt:
* frame #0: 0x000000010033fbd4 Descent3`Debug_ErrorBox(type=2, topstring="Descent 3 Assertion Failed", title="Assertion failed: (0)\n\nFile /Users/jcoby/src/Descent3/sndlib/hlsoundlib.cpp at line 921.", bottomstring="Continue?") at lnxdebug.cpp:88:10
frame #1: 0x00000001002b46f8 Descent3`AssertionFailed(expstr="0", file="/Users/jcoby/src/Descent3/sndlib/hlsoundlib.cpp", line=921) at error.cpp:222:12
frame #2: 0x0000000100318640 Descent3`hlsSystem::ComputePlayInfo(this=0x0000000100f12668, sound_obj_index=0, virtual_pos=0x000000016fdfb5a4, virtual_vel=0x000000016fdfb598, adjusted_volume=0x000000016fdfb594) at hlsoundlib.cpp:921:7
frame #3: 0x0000000100317ff0 Descent3`hlsSystem::Emulate3dSound(this=0x0000000100f12668, sound_obj_index=0) at hlsoundlib.cpp:1033:15
frame #4: 0x0000000100319ef4 Descent3`hlsSystem::Play3dSound(this=0x0000000100f12668, sound_index=291, cur_pos=0x000000016fdfb670, cur_obj=0x0000000100550e50, priority=1, volume=1, flags=0, offset=0) at hlsoundlib.cpp:1166:17
frame #5: 0x00000001003197ac Descent3`hlsSystem::Play3dSound(this=0x0000000100f12668, sound_index=291, priority=1, cur_obj=0x0000000100550e50, volume=1, flags=0, offset=0) at hlsoundlib.cpp:1074:10
frame #6: 0x0000000100015d54 Descent3`AIInit(obj=0x0000000100550e50, ai_class='\0', ai_type='\x02', ai_movement='\x01') at AImain.cpp:3741:49
frame #7: 0x0000000100195c48 Descent3`ObjInitGeneric(objp=0x0000000100550e50, reinit=false) at ObjInit.cpp:810:5
frame #8: 0x0000000100196ef0 Descent3`ObjInitTypeSpecific(objp=0x0000000100550e50, reinitializing=false) at ObjInit.cpp:1117:12
frame #9: 0x0000000100197298 Descent3`ObjInit(objp=0x0000000100550e50, type=2, id=106, handle=8200, pos=0x000000016fdfb9d8, creation_time=0, parent_handle=-1) at ObjInit.cpp:1206:10
frame #10: 0x00000001000f62f0 Descent3`ReadObject(ifile=0x000060000071b270, objp=0x0000000100550e50, handle=8200, fileversion=127) at LoadLevel.cpp:1668:3
frame #11: 0x00000001000ffdc8 Descent3`LoadLevel(filename="trainingmission.d3l", cb_fn=0x0000000000000000) at LoadLevel.cpp:3817:11
patch:
index 09b16ec..0bdea95 100644
--- a/Descent3/AImain.cpp
+++ b/Descent3/AImain.cpp
@@ -1,5 +1,5 @@
/*
-* Descent 3
+* Descent 3
* Copyright (C) 2024 Parallax Software
*
* This program is free software: you can redistribute it and/or modify
@@ -3738,8 +3738,8 @@ bool AIInit(object *obj, ubyte ai_class, ubyte ai_type, ubyte ai_movement) {
// Accounts for sound loading
if ((obj->handle & HANDLE_COUNT_MASK) != 0) {
ai_info->last_played_sound_index = Object_info[obj->id].anim[ai_movement].elem[anim].anim_sound_index;
- ai_info->anim_sound_handle = Sound_system.Play3dSound(
- Object_info[obj->id].anim[ai_movement].elem[anim].anim_sound_index, SND_PRIORITY_LOW, obj);
+ // ai_info->anim_sound_handle = Sound_system.Play3dSound(
+ // Object_info[obj->id].anim[ai_movement].elem[anim].anim_sound_index, SND_PRIORITY_LOW, obj);
} else {
ai_info->last_played_sound_index = -1;
}
Seems we could do with both a sanity check for unlinked objects and some additional logging for objects as they're loaded, which will help us diagnose problems with incorrect room assignments.
Yeah probably be the best to have a modern logging system implemented, eg: debug, notice, warning, error, then use trace level messages on the whole level setup system to track every detail of what is really going on, then logs can be compared between windows and linux.
Seems we could do with both a sanity check for unlinked objects and some additional logging for objects as they're loaded, which will help us diagnose problems with incorrect room assignments.
Yeah probably be the best to have a modern logging system implemented, eg: debug, notice, warning, error, then use trace level messages on the whole level setup system to track every detail of what is really going on, then logs can be compared between windows and linux.
So...imgui?
Nothing related to graphics, just using a simple backend logging library like log4c, where you simply log some text with a chosen verbosity level, then there is usually a config file to control the loggers: logger for the command line, logger for file, etc.
Console logging is available. See pr #157. Or add -DMONO to your cmake config.
The problem doesn't appear to be incorrect room assignments either (see my comment above). It's that the sound system is trying to play a sound before the object is fully initialized and assigned its room number.
I think this is done to play looping sounds but I also can't figure out why this would work on windows and not linux/mac.
I do understand there is some built in logging to the console functionality, but it would better using a robust logging system where things are organized by log level and category. With all these level script going on, and level setup, be nice to turn on trace logs to see what in detail is happening like which source file and function is doing what, and timestamps too. When people file a bug ticket, they can give us better logging information. The built in logger only offers 2 levels and it only logs to console.
I think this is done to play looping sounds but I also can't figure out why this would work on windows and not linux/mac.
I have explained above why this "works" on (32bit!) Windows.
I think this is done to play looping sounds but I also can't figure out why this would work on windows and not linux/mac.
I have explained above why this "works" on (32bit!) Windows.
I totally missed that, thank you!
It sounds like this was never correct. Moving setting objp->roomnum
before the ObjInit
call feels like the correct solution but it causes even more issues with LoadLevel
. Barring that, ignoring playing anything when roomnum
is -1
is probably the second best solution since that object has not been fully loaded and shouldn't be used in the BOA.
Commenting out the Play3dSound call seems to fix the problem but I don't know why it's there.
Good point about this being called from ObjInit(), didn't even look that closely at the backtrace :flushed:
Maybe not calling Play3dSound() at all would be better?
OTOH, AIInit()
(which is indirectly called from ObjInit()
and calls Play3dSound()
) is also called from AIInitAll()
, which is called from StartLevel()
, which is called after LoadLevel()
, and probably in that case roomnum
is correctly set and calling Play3dSound()
is the right thing to do.
In that case my fix is better than not calling Play3dSound()
there at all, but OTOH, it might also be possible to add a bool initSound
argument to AIInit()
and set that to false when calling from ObjInitGeneric()
- but on the other other (other?) hand (and I'm all out of hands here), ObjInitGeneric()
is also called from multiple places and maybe sometimes it's correct for Play3dSound()
being called from it?
Probably my suggested fix (if(sound_seg == -1) return false;
in ComputePlayInfo()
) is least invasive and even if it means that some code is run unnecessarily, it's probably least likely to break anything..
@DanielGibson Do you think on the windows 32-bit builds that arrays are sometimes being access with -1 index value? My guess for windows 32-bit, since things obvious work and the game is playable means when roomnum = -1, it doesn't end up being an index to some array. It feels like some side-effect of the true problem that is going on for the 64-bit environment.
Do you think on the windows 32-bit builds that arrays are sometimes being access with -1 index value?
Yes and no.
I've explained that in length above, but in short: due to multiple overflows when doing the pointer calculations on 32bit, Terrain_seg[0x7FFFFFFF & sound_seg]
(with sound_seg == -1) in BOA_INDEX(sound_seg)
accidentally ends up being equivalent to Terrain_seg[-1]
(which is also invalid, but likely to hit some valid memory at least and thus not segfault).
On 64bit, due to pointers being 64bit, those overflows don't happen and Terrain_seg[0x7FFFFFFF & sound_seg]
(with sound_seg == -1) behaves like Terrain_seg[0x7FFFFFFF]
which is about 41GB behind the end of that array.