Current versions of VRChat crash shortly after startup
More recent version of VRChat crash shortly after startup. Somewhat older versions (I tried a few from around mid August) didn't crash, so it doesn't look like this is a regression in box64.
I haven't narrowed down the exact update this issue started at yet, but can do so if that might help.
The game starts to render and then crashes with a segfault after generally a few frames, I couldn't find anything helpful in the game logs so far (some end after Camera Stack has been initialized, some end a few seconds later after printing out volume settings). The Unity crash handler creates minidumps, but they haven't been very useful so far (minidump-stackwalk fails with The thread list stream was not found, and all I've gotten from winedbg is that it crashed with a segfault).
The box64 output, including last wrapped calls and BTs:
Last calls
4980|0x3f0002fa05: Calling pthread_mutex_lock(0x3F000E8860, 0x3F000E88A0, 0x2FFAE8C0, ...) => return 0x0
4980|0x3f0004c3e8: Calling pthread_sigmask(0x0, 0x3F000E88A0, 0x2FFAE740, ...) => return 0x0
4980|0x3f00072d7b: Calling pthread_getspecific(0x3, 0x3F000E88A0, 0x2FFAE740, ...) => return 0x7FF30000
4980|0x3f0004c2f0: Calling write(45, 0x2ffae940, 64) => return 0x40
4980|0x3f00072d7b: Calling pthread_getspecific(0x3, 0x40, 0x40, ...) => return 0x7FF30000
4980|0x3f0004bb8b: Calling read(41, 0x2ffae940, 64) => return 0x40[02 00 00 C0 00 00 00 00 00 00 ...]
4980|0x3f0004c402: Calling pthread_sigmask(0x2, 0x2FFAE740, 0x0, ...) => return 0x0
4980|0x3f0004c4e1: Calling pthread_mutex_unlock(0x3F000E8860, 0x2FFAE8C0, 0x0, ...) => return 0x0
4980|0x3f0002fb38: Calling pthread_sigmask(0x2, 0x2FFAE8C0, 0x0, ...) => return 0x0
4980|0x3f0004ed35: Calling pthread_sigmask(0x0, 0x3F000E88A0, 0x2FFAF490, ...) => return 0x0
4980|0x3f00072d7b: Calling pthread_getspecific(0x3, 0x2FFAF588, 0x2FFAF370, ...) => return 0x7FF30000
4980|0x3f0004c28d: Calling writev(0x2D, 0x2FFAF350, 0x3, ...) => return 0x70
4980|0x3f00072d7b: Calling pthread_getspecific(0x3, 0x40, 0x70, ...) => return 0x7FF30000
4980|0x3f0004bb8b: Calling read(41, 0x2ffaf510, 64) => return 0x40[00 00 00 00 00 00 00 00 00 00 ...]
4980|0x3f0004efda: Calling pthread_sigmask(0x2, 0x2FFAF490, 0x0, ...) => return 0x0
4980|0x3f0001ecfb: Calling malloc(0x28, 0x1000, 0x0, ...) => return 0x7F6C03A0B0
NativeBT: VRChat.exe() [0x34a56a94]
NativeBT: linux-vdso.so.1(__kernel_rt_sigreturn+0) [0x7fa54867a0]
NativeBT: [0x7f920c6c30]
EmulatedBT: ??? [0x220cf156]
5056|SIGSEGV @0x7f920c6c30 (???(0x7f920c6c30)) (x64pc=0x220cf156/"???", rsp=0x5f65fe68, stack=0x5f560000:0x5f760000 own=(nil) fp=0x181c995d0), for accessing 0x59010000 (code=2/prot=1), db=0x7f9200eb40(0x7f920c6bd0:0x7f920c6d70/0x220cf110:0x220cf1b5/???:clean, hash:76fe66cc/76fe66cc) handler=0x3f000522a0
RAX:0x000000005900ac7c RCX:0x000000005900ff80 RDX:0x000000005900ff88 RBX:0x000000002bb2efd0
RSP:0x000000005f65fe68 RBP:0x0000000000000001 RSI:0x0000000000000001 RDI:0x00000001152458d8
R8:0xfffffffffffface0 R9:0xfffffffffffffffc R10:0x0000000021870000 R11:0x0000000000000080
R12:0x0000000000000000 R13:0x000000002afeb550 R14:0x00000001152458d0 R15:0x000000005e819de0
ES:0x002b CS:0x0033 SS:0x002b DS:0x002b FS:0x0043 GS:0x0053
RSP-0x20:0x000000004023d6b0 RSP-0x18:0x00000000329bd690 RSP-0x10:0x0000000000000250 RSP-0x08:0x00000001152458d8
RSP+0x00:0x000000002207f218 RSP+0x08:0x0000000000000000 RSP+0x10:0x0000000000000001 RSP+0x18:0x0000000000000001 x64opcode=C5 FD E7 89 80 00 00 00 (opcode=3d802161)
I also took a look with GDB, and the code it's crashing inside looks like a memcpy. It's crashing in a loop that copies 256 bytes at a time, but the last 8 bytes of the current block cannot be accessed. The register counting the number of bytes left had the value ffffffffffb84fe0, which to me looks like it underflowed and should've returned much earlier. I'll have to see if I can get more information on the surrounding code or the values it starts with (the code in question doesn't have any proper code before it in memory, and I also couldn't find anything proper further up the stack).
Box64 params used:
Apply RC params for VRChat.exe
Applying BOX64_ROLLING_LOG=1
Applying BOX64_NOSIGSEGV=1
Applying BOX64_SHOWSEGV=1
Applying BOX64_SHOWBT=1
Applying BOX64_JITGDB=2
Applying BOX64_DYNAREC_BIGBLOCK=0
Applying BOX64_DYNAREC_STRONGMEM=2
Applying BOX64_DYNAREC_X87DOUBLE=1
Applying BOX64_DYNAREC_FASTNAN=0
Applying BOX64_DYNAREC_FASTROUND=0
Applying BOX64_DYNAREC_SAFEFLAGS=2
Applying BOX64_DYNAREC_CALLRET=0
Not all of these were required for previous versions, but the additional ones also didn't help here.
With the dynarec disabled, it seems to work fine, so I'm currently doing a run with BOX64_DYNAREC_TEST to maybe find where this issue is coming from.
Interesting. Give it a try with BOX64_DYNAREC_NATIVEFLAGS=0 also. And maybe with BOX64_DYNAREC_SAFEFLAG=2, in case this is heavily encrypted.
Same result with those flags.
From the results of BOX64_DYNAREC_TEST, it looks like the dynarec is not jumping on some jne instructions, while the interpreter is.
Warning, difference between x64 Interpreter and Dynarec in 0x1806b5039 (75 e6 48 83 c4 30 5b c3)
=======================================
DIFF: Dynarec | Interpreter
----------------------
RIP: 00000001806b503b | 00000001806b5021
Warning, difference between x64 Interpreter and Dynarec in 0x1806b5302 (75 cc 4d 89 50 08 49 8b)
=======================================
DIFF: Dynarec | Interpreter
----------------------
RIP: 00000001806b5304 | 00000001806b52d0
Looking at the disassembly of the problematic code in gdb (I haven't yet checked if this is the same address as where these differences are happening, but I assume it is, as these (multiple times) are the last ones before the segv), it also looks like the CPU is not jumping when it should
0x7f6dd1c9cc: mov x2, #0x100 // #256
0x7f6dd1c9d0: str wzr, [x0, #1100]
0x7f6dd1c9d4: subs x5, x18, x2
0x7f6dd1c9d8: b.cc 0x7f6dd1ca00 // b.lo, b.ul, b.last
x18 seems to contain the number of bytes left to move, and the b.cc would jump out of the loop. If it doesn't jump when it should, x18 underflows and the memmove will eventually run into memory it can't access, which seems to be what's happening.
It might be a "inverted carry" flags that doesn't propage correctly?
You should generate a dump: use BOX64_DYNAREC_DUMP=2 to generate a colorfull dump of the generated code, and also redirct to a text file with BOX64_TRACE_FILE=dump.txt. (use less -R dump.txt to visualize, or use DUMP=1 to not generate color)
Because it's a windows app, you should use a profile in ~/.box64rc to only set those for VRChat.exe
The relevant part from the dump (the two adds at the beginning are just incrementing the source and destination address) is
New Instruction x64:0x220cf176, native:0x7fa08cd8f8
0x220cf176: 48 81 C1 00 01 00 00 ADD Ed, Id
0x7fa08cd8f8: 1 emitted opcodes, inst=16, barrier=0 state=3/0(0), set=3F/0, use=0, need=0/0, sm=0(0/0) NF:1 nf:f/0/0, pred=15 Q1:XMM1 Q2:XMM2 Q3:XMM3 Q4:XMM4 Q8:YMM1 Q9:YMM2 Q10:YMM3 Q11:YMM4
9104016b ADD xRCX, xRCX, 0x100
New Instruction x64:0x220cf17d, native:0x7fa08cd8fc
0x220cf17d: 48 81 C2 00 01 00 00 ADD Ed, Id
0x7fa08cd8fc: 1 emitted opcodes, inst=17, barrier=0 state=3/1(0), set=3F/0, use=0, need=0/0, sm=0(0/0) NF:1 nf:f/0/0, pred=16 Q1:XMM1 Q2:XMM2 Q3:XMM3 Q4:XMM4 Q8:YMM1 Q9:YMM2 Q10:YMM3 Q11:YMM4
9104018c ADD xRDX, xRDX, 0x100
New Instruction x64:0x220cf184, native:0x7fa08cd900
0x220cf184: 49 81 E8 00 01 00 00 SUB Ed, Id
0x7fa08cd900: 1 emitted opcodes, inst=18, barrier=0 state=3/1(0), set=3F/0, use=0, need=0/0, sm=0(0/0) NF:1 nf:f/0/0 gic, pred=17 Q1:XMM1 Q2:XMM2 Q3:XMM3 Q4:XMM4 Q8:YMM1 Q9:YMM2 Q10:YMM3 Q11:YMM4
d1040252 SUB xR8, xR8, 0x100
New Instruction x64:0x220cf18b, native:0x7fa08cd904
0x220cf18b: 49 81 F8 00 01 00 00 CMP Ed, Id
0x7fa08cd904: 3 emitted opcodes, inst=19, barrier=0 state=3/1(0), set=3F/1, use=0, need=0/1, sm=0(0/0) NF:1 nf:f/0/8 gic ic, pred=18 Q1:XMM1 Q2:XMM2 Q3:XMM3 Q4:XMM4 Q8:YMM1 Q9:YMM2 Q10:YMM3 Q11:YMM4
d2802002 MOVZ x2, 0x100
b9044c1f STR wZR, [xEmu, 0x44c]
eb020245 SUBS x5, xR8, x2
New Instruction x64:0x220cf192, native:0x7fa08cd910
0x220cf192: 0F 83 78 FF FF FF JNC Id
0x7fa08cd910: 10 emitted opcodes, inst=20, barrier=0 state=0/1(1), set=0/0, use=1, need=1/0, sm=0(0/0) nf:0/8/8 ic, pred=19, jmp=0 Q1:XMM1 Q2:XMM2 Q3:XMM3 Q4:XMM4 Q8:YMM1 Q9:YMM2 Q10:YMM3 Q11:YMM4
54000143 B.cCC #+10i ; 0x7fa08cd938
Cache Transform ---- ninst=20 -> 0
Purge locally SSE Cache ------
3d802801 STR Q1, [xEmu, 0xa0]
3d802c02 STR Q2, [xEmu, 0xb0]
3d803003 STR Q3, [xEmu, 0xc0]
3d803404 STR Q4, [xEmu, 0xd0]
3d806808 STR Q8, [xEmu, 0x1a0]
3d806c09 STR Q9, [xEmu, 0x1b0]
3d80700a STR Q10, [xEmu, 0x1c0]
3d80740b STR Q11, [xEmu, 0x1d0]
------ Purge SSE Cache
---- Cache Transform
17ffffd1 B #+-47i ; 0x7fa08cd878
(x86_64 asm:)
sub r8,0x100
cmp r8,0x100
jae 0xffffffffffffff7e
The generated code looks fine to me so far. If the subs produces an unsigned underflow, the carry flag is cleared and the code jumps out of the loop. Otherwise, it jumps back to the start of the loop. Just to make sure, I also manually verified this with gdb that it should work correctly.
I'll next do a run with trace enabled over this block.
The issue seems to be happening outside of that block, as it already started with a negative value. I'll need to expand the range I'm tracing on a bit (but as little as possible) to chase down where this originates from, as I currently have a trace that's about 4GB and looks like it might not even contain the problematic part.
You can trace only on an interval to limit the size of the trace (with BOX4_TRACE=0x40000-0x41000... But of course trace can rapidly be huge!
That's what I did, but the interval I had at first was too small, and the one I used afterwards produced a lot more logs and might still have been too small. I'll do some more digging tomorrow (probably with ghidra, to find the relevant range for this function)
I haven't gotten further yet, but I looked a bit through the differences DYNAREC_TEST printed out.
Warning, difference between x64 Interpreter and Dynarec in 0x1700185f0 (8b 13 89 d0 89 55 fc c1)
Warning, difference between x64 Interpreter and Dynarec in 0x170018668 (8b 13 89 d0 89 55 fc c1)
Warning, difference between x64 Interpreter and Dynarec in 0x17002185e (8b 43 08 85 c0 7f 3b 83)
Warning, difference between x64 Interpreter and Dynarec in 0x17002865d (0f 94 c3 84 db 74 cc 48)
Warning, difference between x64 Interpreter and Dynarec in 0x170030465 (41 8b 36 89 f3 c1 fb 10)
Warning, difference between x64 Interpreter and Dynarec in 0x1700366ad (87 03 31 c9 4c 89 e2 e8)
Warning, difference between x64 Interpreter and Dynarec in 0x170036796 (87 03 45 31 e4 e9 6f ff)
Warning, difference between x64 Interpreter and Dynarec in 0x170036919 (87 07 eb d3 0f 1f 00 48)
Warning, difference between x64 Interpreter and Dynarec in 0x170036a11 (87 02 48 83 c4 38 5b 5d)
Warning, difference between x64 Interpreter and Dynarec in 0x18001d1b7 (75 e7 48 8d 0d e0 1a 83)
Warning, difference between x64 Interpreter and Dynarec in 0x180356f46 (8b 51 08 4c 8b c9 bb 01)
Warning, difference between x64 Interpreter and Dynarec in 0x1803614cc (0f 94 c1 48 89 44 24 20)
Warning, difference between x64 Interpreter and Dynarec in 0x1804d0b97 (75 e7 ff cf 85 ff 0f 8f)
Warning, difference between x64 Interpreter and Dynarec in 0x1806b4fb2 (75 ec 49 8b c0 48 83 c4)
Warning, difference between x64 Interpreter and Dynarec in 0x1806b5039 (75 e6 48 83 c4 30 5b c3)
Warning, difference between x64 Interpreter and Dynarec in 0x1806b5252 (75 ed 48 89 5f 40 48 8b)
Warning, difference between x64 Interpreter and Dynarec in 0x1806b5302 (75 cc 4d 89 50 08 49 8b)
Warning, difference between x64 Interpreter and Dynarec in 0x1806b7764 (8b b7 c0 00 00 00 8b 87)
Warning, difference between x64 Interpreter and Dynarec in 0x180d66bc7 (75 e7 44 89 87 c0 00 00)
Warning, difference between x64 Interpreter and Dynarec in 0x2207e454 (0f 94 c0 3c 01 75 d6 e9)
Warning, difference between x64 Interpreter and Dynarec in 0x22080b1d (0f 94 c0 48 89 54 24 08)
Warning, difference between x64 Interpreter and Dynarec in 0x2209f6f4 (c4 e2 6d 8c 01 c4 e2 7d)
Warning, difference between x64 Interpreter and Dynarec in 0x30110080 (cc 53 43 00 00 00 00 00)
Warning, difference between x64 Interpreter and Dynarec in 0x3f0002f90b (83 78 08 01 7e 15 4b 8b)
Warning, difference between x64 Interpreter and Dynarec in 0x3f0002fff0 (87 30 e9 d6 fd ff ff 66)
Warning, difference between x64 Interpreter and Dynarec in 0x3f0003023d (44 8b 32 80 bd 88 f4 ff)
Warning, difference between x64 Interpreter and Dynarec in 0x3f00031731 (8b 40 08 59 5e 83 f8 01)
Warning, difference between x64 Interpreter and Dynarec in 0x3f0003222e (8b 40 08 83 f8 01 74 5a)
Warning, difference between x64 Interpreter and Dynarec in 0x3f00066c32 (87 03 85 c0 0f 85 cc 00)
I checked some of them and couldn't find anything that shouldn't work when comparing the code the dynarec produced to the original instructions.
Did you tried to run VRChat with `BOX64_DYNAREC_NATIVEFLAGS=0`` ?
Just tried it again to be sure, didn't help.
I just did a quick test with 067cf75c, and it works on that version, so this seems to be a regression. The bad commit seems to be a450b9d12f99df13adcae4a570aeb780964d43b2. Reverting that commit fixes it.
@Coreforge is this still an issue with the latest Box64 and Wine/Proton?
Runs fine with on the current version of Box64 and VRChat. I didn't have the most up to date Proton (was on Proton 9), but I don't think that should affect it much. I also only tried desktop mode, but I don't expect VR to be an issue if desktop works correctly.
Just for completeness' sake, options I used were
BOX64_DYNAREC_STRONGMEM=1
BOX64_DYNAREC_BIGBLOCK=3
BOX64_DYNAREC_SAFEFLAGS=0
BOX64_DYNAREC_CALLRET=1
I'm also only running it in offline testing mode, as EAC does not work (which isn't surprising though).