libTAS
libTAS copied to clipboard
Savestate Loading doesn't work in Celeste64, causes SIGSEGV in `ThreadManager.cpp`
When I start the game without gdb, save a state and then load it, one of messages gets printed:
[f:70 t:17123M] ERROR (checkpoint/Checkpoint.cpp:702): mprotect(saved_area.addr, saved_area.size, saved_area.prot | PROT_WRITE) == 0 failed in readAnArea with error Cannot allocate memory
free(): corrupted unsorted chunks
recv() returns 0 -> socket closed
or
[f:154 t:17453M] Saved state 1 of size 116621852 in 0.481466 seconds
malloc(): mismatching next->prev_size (unsorted)
or
[f:240 t:17673M] Saved state 1 of size 130795131 in 0.497735 seconds
malloc(): invalid size (unsorted)
or
[f:114 t:18044M] Saved state 2 of size 115862809 in 0.485308 seconds
malloc(): unsorted double linked list corrupted
or just
recv() returns 0 -> socket closed
Got wrong message after state loading
When I start the game in gdb, I get this backtrace
Thread 1 "Celeste64" received signal SIGSEGV, Segmentation fault.
libtas::ThreadManager::getThreadTid () at checkpoint/ThreadManager.cpp:106
106 return current_thread->tid;
(gdb) br
Breakpoint 1 at 0x7ffff7c47a2e: file checkpoint/ThreadManager.cpp, line 106.
(gdb) bt
#0 libtas::ThreadManager::getThreadTid () at checkpoint/ThreadManager.cpp:106
#1 0x00007ffff7c2a06a in libtas::debuglogfull (lcf=lcf@entry=1028, file=file@entry=0x7ffff7d6741b "checkpoint/Checkpoint.cpp", line=line@entry=672) at logging.cpp:109
#2 0x00007ffff7c447d9 in libtas::reallocateArea (current_area=0x7ffff72c3070, saved_area=0x7ffff72c34d0) at checkpoint/Checkpoint.cpp:672
#3 libtas::readAllAreas () at checkpoint/Checkpoint.cpp:482
#4 libtas::Checkpoint::handler (signum=<optimized out>, info=<optimized out>, ucontext=0x7ffff7300480) at checkpoint/Checkpoint.cpp:332
#5 <signal handler called>
#6 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=31, no_tid=no_tid@entry=0) at pthread_kill.c:44
#7 0x00007ffff76ac8a3 in __pthread_kill_internal (signo=31, threadid=<optimized out>) at pthread_kill.c:78
#8 0x00007ffff765c668 in __GI_raise (sig=31) at ../sysdeps/posix/raise.c:26
#9 0x00007ffff7c47086 in libtas::SaveStateManager::restore (slot=<optimized out>) at checkpoint/SaveStateManager.cpp:379
#10 0x00007ffff7c261df in libtas::receive_messages (hud=..., draw=...) at frame.cpp:822
#11 libtas::frameBoundary(std::function<void ()>, libtas::RenderHUD&) (draw=..., hud=...) at frame.cpp:457
#12 0x00007ffff7c8e9cd in libtas::SDL_GL_SwapWindow (window=<optimized out>) at sdl/sdlwindows.cpp:100
#13 0x00005554d74103ab in ?? ()
#14 0x00000000000003e8 in ?? ()
#15 0x0000555555f90a58 in ?? ()
#16 0xffffffffffffffff in ?? ()
#17 0x00005554d6b2ddf0 in ?? ()
#18 0x00007ffff7e1b5a0 in ?? () from /usr/local/bin/libtas.so
#19 0x00007fffffffc490 in ?? ()
#20 0x00005554d74103ab in ?? ()
#21 0x00007fffffffc530 in ?? ()
#22 0x00007fbfda039f10 in ?? ()
#23 0x00007fbfda0079b0 in ?? ()
#24 0x0000000000000000 in ?? ()
https://github.com/clementgallet/libTAS/blob/a3246feb4810b26df6e55a81a07565a5efb152fe/src/library/checkpoint/ThreadManager.cpp#L105-L106
There's a null check, so maybe this is related to the fact that current_thread is a thread_local?
One of the issues is present when enabling the checkpoint log, because appending the log line to ImGui log window may cause a malloc. However, no memory allocation must be performed during checkpoint code, nothing that can alter the process memory mapping.
But the main issue lies in the following line:
https://github.com/clementgallet/libTAS/blob/a3246feb4810b26df6e55a81a07565a5efb152fe/src/library/checkpoint/Checkpoint.cpp#L702
Knowing that in Celeste64, one of the memory segments has a size of 274 GB! Adding the write flag commits the allocated memory, hence the memory allocation error.
I could fix it already, I'm only calling the mprotect() function on each individual page when I know I will write to it, instead of on the whole memory segment. One issue is that it leads to more fragmented Virtual Memory Areas (as seen in /proc/PID/maps ). This is because, even when two areas share the same protection flags, they are not merged if one has the write flag once and was removed later (there seems to be a recent fix).
This is related to a more general issue that I couldn't really solve. I would really like to be able to see if a memory page is mapped to a physical page or not. There is the present flag in /proc/PID/pagemap, see doc that I'm checking, and is handled by the option Runtime > Savestates > Skip Unmapped Pages . But this flag does not seem to be perfectly robust:
When I implemented the above fix, the 274GB segment got fragmented multiple times, which should mean that some memory pages where committed because they are flagged as present. However, when running the game natively, the huge memory segment is not fragmented, which should mean that it was never written to?
We would have more understanding if we could look at /proc/PID/kpageflags file, to get a flag of the physical memory page, but this requires CAP_SYS_ADMIN (basically root) because of some vulnerabilities.
Interesting. After that change, I can now savestate (with skip unmapped pages at least, without that option I get "state invalid because new threads were created" 100% of the time).
When launching with GDB, I still get the ThreadManager::getThreadTid segfault consistently, but without gdb everything works as expected.
The state invalid because new threads were created is another issue: .NET threadpool is creating and destroying threads depending on tasks given to them.