AArch64 JIT block linking bug. Causes crash inside of `linuxgsm` setup.
When running linuxgsm through its setup stages, it manages to crash to SIGSEGV with a branch to zero. This is from some bug in our block linking code.
Following the commands here: https://linuxgsm.com/servers/tfcserver/ This can be reproduced by executing the following commands while inside of FEXBash
wget -O linuxgsm.sh https://linuxgsm.sh && chmod +x linuxgsm.sh && bash linuxgsm.sh tfcserver
./tfcserver install
The second command is the one that will crash. While it works on the x86-64 JIT.
A workaround is to change the ExitFunctionLink function to always do a dispatcher loop to the top, Arm64JITCore_ExitFunctionLink in there. But that uncovers an issue that this slows down code execution to the point that their curl instances can't download their config files in time due to timeout.
The crash occurs in the Arm64Dispatcher.cpp in the ExitFunctionLinkerAddress asm routine.
ldr(x3, STATE_PTR(CpuStateFrame, Pointers.Common.ExitFunctionLink));
This LDR manages to load a nullptr
Almost feels like the CpuStateFrame is getting corrupted somehow.
Adding Stef to this bug since they have the experience of dealing with the block linking and might see the bug quicker.
Looks like this might be a clang-14 bug. I hardcoded the function address for ExitFunctionLinker rather than loading from context and it still returned to zero.
The hardcoded address is valid in the dispatcher, and LR is still returning to the dispatcher after the blr but it still jumped to zero, which seems to imply a tail call optimization breaking somewhere.
Disabling SRA works around this issue, which is odd.
Single instruction sized blocks also happen to work around the issue.
Disabling optimization passes also seems to work around this issue.
Tinkering around in const prop changes behaviour but I don't know if it is a actually affecting the core problem.
This might actually be a race condition somewhere. tasksetting the process to a single core seems to work around it.
Hmm, delinking is not synchronized. Are there more than one threads involved?
That could also explain the random SMC test crashes we're seeing.
Definitely multiple processes. Not sure if there is multiple threads, it's just a bash script after all
Hmm, there's also the VFORK lack of blocking that can lead to corruption, and iirc bash uses VFORK
Can't repro here
mdMMMMbm
mMMMMMMMMMMm
mMMMMMMMMMMMMm
mMMMMMMMMMMMMMMm
hMMMV^VMMV^VMMMh
MMMMM MM MMMMM
hMMs vv sMMh
hMMM: :MMMh
.hMMMh hMMMh.
-dMMMh __ hMMMd-
:mMMMs || sMMMm:
:MMMM+ || _ +NMMN:
.mMMM+ ======== +MMMm.
yMMMy ############## yMMMy
mMMM: ############## :MMMm
mMM nn nn nn nn MMm
o nNNNNNNNn nNNNNNNNn o
nNNNNNNNNNn nNNNNNNNNNn
nNNNNNNNNNNN NNNNNNNNNNNn
+NNNNNNNNN: :NNNNNNNNN+
nNNNNNNN /\ NNNNNNNn
nnnnn db nnnnn
888 d8b .d8888b. .d8888b. 888b d888
888 Y8P d88P Y88b d88P Y88b 8888b d8888
888 888 888 Y88b. 88888b.d88888
888 888 88888b. 888 888 888 888 888 Y888b. 888Y88888P888
888 888 888 88b 888 888 Y8bd8P 888 88888 Y88b. 888 Y888P 888
888 888 888 888 888 888 X88K 888 888 888 888 Y8P 888
888 888 888 888 Y88b 88Y .d8pq8b. Y88b d88P Y88b d88P 888 * 888
LinuxGSM 888 888 888 Y8888Y 888 888 Y2012P88 Y8888P 888 888
=================================
LinuxGSM_
by Daniel Gibbs
Version: v22.1.0
Game: Team Fortress Classic
Website: https://linuxgsm.com
Contributors: https://linuxgsm.com/contrib
Sponsor: https://linuxgsm.com/sponsor
=================================
Server Directory
=================================
Warning! A server is already installed here.
/home/skmp/projects/FEX/build
Continue? [Y/n] Y
Creating log directories
=================================
installing log dir: /home/skmp/projects/FEX/build/log...OK
installing LinuxGSM log dir: /home/skmp/projects/FEX/build/log/script...OK
creating LinuxGSM log: /home/skmp/projects/FEX/build/log/script/tfcserver-script.log...OK
installing console log dir: /home/skmp/projects/FEX/build/log/console...OK
creating console log: /home/skmp/projects/FEX/build/log/console/tfcserver-console.log...OK
installing game log dir: /home/skmp/projects/FEX/build/serverfiles/tfc/logs...OK
creating symlink to game log dir: /home/skmp/projects/FEX/build/log/server -> /home/skmp/projects/FEX/build/serverfiles/tfc/logs...OK
Checking Dependencies
=================================
bc
binutils
bsdmainutils
bzip2
ca-certificates
cpio
curl
distro-info
file
gzip
hostname
jq
lib32gcc-s1
lib32stdc++6
libsdl2-2.0-0:i386
netcat
python3
steamcmd
tar
tmux
unzip
util-linux
wget
xz-utils
Warning! Missing dependencies: bc bsdmainutils cpio distro-info jq lib32gcc-s1 lib32stdc++6 netcat steamcmd tmux wget
Warning! skmp does not have sudo access. Manually install dependencies.
sudo dpkg --add-architecture i386; sudo apt update; sudo apt install bc bsdmainutils cpio distro-info jq lib32gcc-s1 lib32stdc++6 netcat steamcmd tmux wget
Failure! Missing dependencies required to run SteamCMD.
FEXBash-skmp@ornio:~/projects/FEX/build>
When main is run on the orion
Github bot is too eager, still happens.