box64 icon indicating copy to clipboard operation
box64 copied to clipboard

Current versions of VRChat crash shortly after startup

Open Coreforge opened this issue 1 year ago • 14 comments

More recent version of VRChat crash shortly after startup. Somewhat older versions (I tried a few from around mid August) didn't crash, so it doesn't look like this is a regression in box64.
I haven't narrowed down the exact update this issue started at yet, but can do so if that might help.

The game starts to render and then crashes with a segfault after generally a few frames, I couldn't find anything helpful in the game logs so far (some end after Camera Stack has been initialized, some end a few seconds later after printing out volume settings). The Unity crash handler creates minidumps, but they haven't been very useful so far (minidump-stackwalk fails with The thread list stream was not found, and all I've gotten from winedbg is that it crashed with a segfault).

The box64 output, including last wrapped calls and BTs:

Last calls
4980|0x3f0002fa05: Calling pthread_mutex_lock(0x3F000E8860, 0x3F000E88A0, 0x2FFAE8C0, ...) => return 0x0
4980|0x3f0004c3e8: Calling pthread_sigmask(0x0, 0x3F000E88A0, 0x2FFAE740, ...) => return 0x0
4980|0x3f00072d7b: Calling pthread_getspecific(0x3, 0x3F000E88A0, 0x2FFAE740, ...) => return 0x7FF30000
4980|0x3f0004c2f0: Calling write(45, 0x2ffae940, 64) => return 0x40
4980|0x3f00072d7b: Calling pthread_getspecific(0x3, 0x40, 0x40, ...) => return 0x7FF30000
4980|0x3f0004bb8b: Calling read(41, 0x2ffae940, 64) => return 0x40[02 00 00 C0 00 00 00 00 00 00 ...] 
4980|0x3f0004c402: Calling pthread_sigmask(0x2, 0x2FFAE740, 0x0, ...) => return 0x0
4980|0x3f0004c4e1: Calling pthread_mutex_unlock(0x3F000E8860, 0x2FFAE8C0, 0x0, ...) => return 0x0
4980|0x3f0002fb38: Calling pthread_sigmask(0x2, 0x2FFAE8C0, 0x0, ...) => return 0x0
4980|0x3f0004ed35: Calling pthread_sigmask(0x0, 0x3F000E88A0, 0x2FFAF490, ...) => return 0x0
4980|0x3f00072d7b: Calling pthread_getspecific(0x3, 0x2FFAF588, 0x2FFAF370, ...) => return 0x7FF30000
4980|0x3f0004c28d: Calling writev(0x2D, 0x2FFAF350, 0x3, ...) => return 0x70
4980|0x3f00072d7b: Calling pthread_getspecific(0x3, 0x40, 0x70, ...) => return 0x7FF30000
4980|0x3f0004bb8b: Calling read(41, 0x2ffaf510, 64) => return 0x40[00 00 00 00 00 00 00 00 00 00 ...] 
4980|0x3f0004efda: Calling pthread_sigmask(0x2, 0x2FFAF490, 0x0, ...) => return 0x0
4980|0x3f0001ecfb: Calling malloc(0x28, 0x1000, 0x0, ...) => return 0x7F6C03A0B0
NativeBT: VRChat.exe() [0x34a56a94]
NativeBT: linux-vdso.so.1(__kernel_rt_sigreturn+0) [0x7fa54867a0]
NativeBT: [0x7f920c6c30]
EmulatedBT: ??? [0x220cf156]
5056|SIGSEGV @0x7f920c6c30 (???(0x7f920c6c30)) (x64pc=0x220cf156/"???", rsp=0x5f65fe68, stack=0x5f560000:0x5f760000 own=(nil) fp=0x181c995d0), for accessing 0x59010000 (code=2/prot=1), db=0x7f9200eb40(0x7f920c6bd0:0x7f920c6d70/0x220cf110:0x220cf1b5/???:clean, hash:76fe66cc/76fe66cc) handler=0x3f000522a0
RAX:0x000000005900ac7c RCX:0x000000005900ff80 RDX:0x000000005900ff88 RBX:0x000000002bb2efd0 
RSP:0x000000005f65fe68 RBP:0x0000000000000001 RSI:0x0000000000000001 RDI:0x00000001152458d8 
 R8:0xfffffffffffface0  R9:0xfffffffffffffffc R10:0x0000000021870000 R11:0x0000000000000080 
R12:0x0000000000000000 R13:0x000000002afeb550 R14:0x00000001152458d0 R15:0x000000005e819de0 
ES:0x002b CS:0x0033 SS:0x002b DS:0x002b FS:0x0043 GS:0x0053 
RSP-0x20:0x000000004023d6b0 RSP-0x18:0x00000000329bd690 RSP-0x10:0x0000000000000250 RSP-0x08:0x00000001152458d8
RSP+0x00:0x000000002207f218 RSP+0x08:0x0000000000000000 RSP+0x10:0x0000000000000001 RSP+0x18:0x0000000000000001 x64opcode=C5 FD E7 89 80 00 00 00 (opcode=3d802161)

I also took a look with GDB, and the code it's crashing inside looks like a memcpy. It's crashing in a loop that copies 256 bytes at a time, but the last 8 bytes of the current block cannot be accessed. The register counting the number of bytes left had the value ffffffffffb84fe0, which to me looks like it underflowed and should've returned much earlier. I'll have to see if I can get more information on the surrounding code or the values it starts with (the code in question doesn't have any proper code before it in memory, and I also couldn't find anything proper further up the stack).

Box64 params used:

Apply RC params for VRChat.exe
Applying BOX64_ROLLING_LOG=1
Applying BOX64_NOSIGSEGV=1
Applying BOX64_SHOWSEGV=1
Applying BOX64_SHOWBT=1
Applying BOX64_JITGDB=2
Applying BOX64_DYNAREC_BIGBLOCK=0
Applying BOX64_DYNAREC_STRONGMEM=2
Applying BOX64_DYNAREC_X87DOUBLE=1
Applying BOX64_DYNAREC_FASTNAN=0
Applying BOX64_DYNAREC_FASTROUND=0
Applying BOX64_DYNAREC_SAFEFLAGS=2
Applying BOX64_DYNAREC_CALLRET=0

Not all of these were required for previous versions, but the additional ones also didn't help here.

Coreforge avatar Nov 03 '24 21:11 Coreforge

With the dynarec disabled, it seems to work fine, so I'm currently doing a run with BOX64_DYNAREC_TEST to maybe find where this issue is coming from.

Coreforge avatar Nov 04 '24 15:11 Coreforge

Interesting. Give it a try with BOX64_DYNAREC_NATIVEFLAGS=0 also. And maybe with BOX64_DYNAREC_SAFEFLAG=2, in case this is heavily encrypted.

ptitSeb avatar Nov 04 '24 15:11 ptitSeb

Same result with those flags.

From the results of BOX64_DYNAREC_TEST, it looks like the dynarec is not jumping on some jne instructions, while the interpreter is.

Warning, difference between x64 Interpreter and Dynarec in 0x1806b5039 (75 e6 48 83 c4 30 5b c3)
=======================================
DIFF: Dynarec |  Interpreter
----------------------
RIP: 00000001806b503b | 00000001806b5021
Warning, difference between x64 Interpreter and Dynarec in 0x1806b5302 (75 cc 4d 89 50 08 49 8b)
=======================================
DIFF: Dynarec |  Interpreter
----------------------
RIP: 00000001806b5304 | 00000001806b52d0

Looking at the disassembly of the problematic code in gdb (I haven't yet checked if this is the same address as where these differences are happening, but I assume it is, as these (multiple times) are the last ones before the segv), it also looks like the CPU is not jumping when it should

   0x7f6dd1c9cc:	mov	x2, #0x100                 	// #256
   0x7f6dd1c9d0:	str	wzr, [x0, #1100]
   0x7f6dd1c9d4:	subs	x5, x18, x2
   0x7f6dd1c9d8:	b.cc	0x7f6dd1ca00  // b.lo, b.ul, b.last

x18 seems to contain the number of bytes left to move, and the b.cc would jump out of the loop. If it doesn't jump when it should, x18 underflows and the memmove will eventually run into memory it can't access, which seems to be what's happening.

Coreforge avatar Nov 04 '24 15:11 Coreforge

It might be a "inverted carry" flags that doesn't propage correctly? You should generate a dump: use BOX64_DYNAREC_DUMP=2 to generate a colorfull dump of the generated code, and also redirct to a text file with BOX64_TRACE_FILE=dump.txt. (use less -R dump.txt to visualize, or use DUMP=1 to not generate color) Because it's a windows app, you should use a profile in ~/.box64rc to only set those for VRChat.exe

ptitSeb avatar Nov 04 '24 16:11 ptitSeb

The relevant part from the dump (the two adds at the beginning are just incrementing the source and destination address) is

New Instruction x64:0x220cf176, native:0x7fa08cd8f8
0x220cf176: 48 81 C1 00 01 00 00  ADD Ed, Id
0x7fa08cd8f8: 1 emitted opcodes, inst=16, barrier=0 state=3/0(0), set=3F/0, use=0, need=0/0, sm=0(0/0) NF:1 nf:f/0/0, pred=15 Q1:XMM1 Q2:XMM2 Q3:XMM3 Q4:XMM4 Q8:YMM1 Q9:YMM2 Q10:YMM3 Q11:YMM4
        9104016b        ADD xRCX, xRCX, 0x100
New Instruction x64:0x220cf17d, native:0x7fa08cd8fc
0x220cf17d: 48 81 C2 00 01 00 00  ADD Ed, Id
0x7fa08cd8fc: 1 emitted opcodes, inst=17, barrier=0 state=3/1(0), set=3F/0, use=0, need=0/0, sm=0(0/0) NF:1 nf:f/0/0, pred=16 Q1:XMM1 Q2:XMM2 Q3:XMM3 Q4:XMM4 Q8:YMM1 Q9:YMM2 Q10:YMM3 Q11:YMM4
        9104018c        ADD xRDX, xRDX, 0x100
New Instruction x64:0x220cf184, native:0x7fa08cd900
0x220cf184: 49 81 E8 00 01 00 00  SUB Ed, Id
0x7fa08cd900: 1 emitted opcodes, inst=18, barrier=0 state=3/1(0), set=3F/0, use=0, need=0/0, sm=0(0/0) NF:1 nf:f/0/0 gic, pred=17 Q1:XMM1 Q2:XMM2 Q3:XMM3 Q4:XMM4 Q8:YMM1 Q9:YMM2 Q10:YMM3 Q11:YMM4
        d1040252        SUB xR8, xR8, 0x100
New Instruction x64:0x220cf18b, native:0x7fa08cd904
0x220cf18b: 49 81 F8 00 01 00 00  CMP Ed, Id
0x7fa08cd904: 3 emitted opcodes, inst=19, barrier=0 state=3/1(0), set=3F/1, use=0, need=0/1, sm=0(0/0) NF:1 nf:f/0/8 gic ic, pred=18 Q1:XMM1 Q2:XMM2 Q3:XMM3 Q4:XMM4 Q8:YMM1 Q9:YMM2 Q10:YMM3 Q11:YMM4
        d2802002        MOVZ x2, 0x100
        b9044c1f        STR wZR, [xEmu, 0x44c]
        eb020245        SUBS x5, xR8, x2
New Instruction x64:0x220cf192, native:0x7fa08cd910
0x220cf192: 0F 83 78 FF FF FF  JNC Id
0x7fa08cd910: 10 emitted opcodes, inst=20, barrier=0 state=0/1(1), set=0/0, use=1, need=1/0, sm=0(0/0) nf:0/8/8 ic, pred=19, jmp=0 Q1:XMM1 Q2:XMM2 Q3:XMM3 Q4:XMM4 Q8:YMM1 Q9:YMM2 Q10:YMM3 Q11:YMM4
        54000143        B.cCC #+10i     ; 0x7fa08cd938
        Cache Transform ---- ninst=20 -> 0
        Purge locally SSE Cache ------
        3d802801        STR Q1, [xEmu, 0xa0]
        3d802c02        STR Q2, [xEmu, 0xb0]
        3d803003        STR Q3, [xEmu, 0xc0]
        3d803404        STR Q4, [xEmu, 0xd0]
        3d806808        STR Q8, [xEmu, 0x1a0]
        3d806c09        STR Q9, [xEmu, 0x1b0]
        3d80700a        STR Q10, [xEmu, 0x1c0]
        3d80740b        STR Q11, [xEmu, 0x1d0]
        ------ Purge SSE Cache
        ---- Cache Transform
        17ffffd1        B #+-47i        ; 0x7fa08cd878

(x86_64 asm:)

sub    r8,0x100 
cmp    r8,0x100
jae    0xffffffffffffff7e 

The generated code looks fine to me so far. If the subs produces an unsigned underflow, the carry flag is cleared and the code jumps out of the loop. Otherwise, it jumps back to the start of the loop. Just to make sure, I also manually verified this with gdb that it should work correctly.

I'll next do a run with trace enabled over this block.

Coreforge avatar Nov 04 '24 17:11 Coreforge

The issue seems to be happening outside of that block, as it already started with a negative value. I'll need to expand the range I'm tracing on a bit (but as little as possible) to chase down where this originates from, as I currently have a trace that's about 4GB and looks like it might not even contain the problematic part.

Coreforge avatar Nov 04 '24 20:11 Coreforge

You can trace only on an interval to limit the size of the trace (with BOX4_TRACE=0x40000-0x41000... But of course trace can rapidly be huge!

ptitSeb avatar Nov 04 '24 20:11 ptitSeb

That's what I did, but the interval I had at first was too small, and the one I used afterwards produced a lot more logs and might still have been too small. I'll do some more digging tomorrow (probably with ghidra, to find the relevant range for this function)

Coreforge avatar Nov 04 '24 20:11 Coreforge

I haven't gotten further yet, but I looked a bit through the differences DYNAREC_TEST printed out.

Warning, difference between x64 Interpreter and Dynarec in 0x1700185f0 (8b 13 89 d0 89 55 fc c1)
Warning, difference between x64 Interpreter and Dynarec in 0x170018668 (8b 13 89 d0 89 55 fc c1)
Warning, difference between x64 Interpreter and Dynarec in 0x17002185e (8b 43 08 85 c0 7f 3b 83)
Warning, difference between x64 Interpreter and Dynarec in 0x17002865d (0f 94 c3 84 db 74 cc 48)
Warning, difference between x64 Interpreter and Dynarec in 0x170030465 (41 8b 36 89 f3 c1 fb 10)
Warning, difference between x64 Interpreter and Dynarec in 0x1700366ad (87 03 31 c9 4c 89 e2 e8)
Warning, difference between x64 Interpreter and Dynarec in 0x170036796 (87 03 45 31 e4 e9 6f ff)
Warning, difference between x64 Interpreter and Dynarec in 0x170036919 (87 07 eb d3 0f 1f 00 48)
Warning, difference between x64 Interpreter and Dynarec in 0x170036a11 (87 02 48 83 c4 38 5b 5d)
Warning, difference between x64 Interpreter and Dynarec in 0x18001d1b7 (75 e7 48 8d 0d e0 1a 83)
Warning, difference between x64 Interpreter and Dynarec in 0x180356f46 (8b 51 08 4c 8b c9 bb 01)
Warning, difference between x64 Interpreter and Dynarec in 0x1803614cc (0f 94 c1 48 89 44 24 20)
Warning, difference between x64 Interpreter and Dynarec in 0x1804d0b97 (75 e7 ff cf 85 ff 0f 8f)
Warning, difference between x64 Interpreter and Dynarec in 0x1806b4fb2 (75 ec 49 8b c0 48 83 c4)
Warning, difference between x64 Interpreter and Dynarec in 0x1806b5039 (75 e6 48 83 c4 30 5b c3)
Warning, difference between x64 Interpreter and Dynarec in 0x1806b5252 (75 ed 48 89 5f 40 48 8b)
Warning, difference between x64 Interpreter and Dynarec in 0x1806b5302 (75 cc 4d 89 50 08 49 8b)
Warning, difference between x64 Interpreter and Dynarec in 0x1806b7764 (8b b7 c0 00 00 00 8b 87)
Warning, difference between x64 Interpreter and Dynarec in 0x180d66bc7 (75 e7 44 89 87 c0 00 00)
Warning, difference between x64 Interpreter and Dynarec in 0x2207e454 (0f 94 c0 3c 01 75 d6 e9)
Warning, difference between x64 Interpreter and Dynarec in 0x22080b1d (0f 94 c0 48 89 54 24 08)
Warning, difference between x64 Interpreter and Dynarec in 0x2209f6f4 (c4 e2 6d 8c 01 c4 e2 7d)
Warning, difference between x64 Interpreter and Dynarec in 0x30110080 (cc 53 43 00 00 00 00 00)
Warning, difference between x64 Interpreter and Dynarec in 0x3f0002f90b (83 78 08 01 7e 15 4b 8b)
Warning, difference between x64 Interpreter and Dynarec in 0x3f0002fff0 (87 30 e9 d6 fd ff ff 66)
Warning, difference between x64 Interpreter and Dynarec in 0x3f0003023d (44 8b 32 80 bd 88 f4 ff)
Warning, difference between x64 Interpreter and Dynarec in 0x3f00031731 (8b 40 08 59 5e 83 f8 01)
Warning, difference between x64 Interpreter and Dynarec in 0x3f0003222e (8b 40 08 83 f8 01 74 5a)
Warning, difference between x64 Interpreter and Dynarec in 0x3f00066c32 (87 03 85 c0 0f 85 cc 00)

I checked some of them and couldn't find anything that shouldn't work when comparing the code the dynarec produced to the original instructions.

Coreforge avatar Nov 06 '24 17:11 Coreforge

Did you tried to run VRChat with `BOX64_DYNAREC_NATIVEFLAGS=0`` ?

ptitSeb avatar Nov 06 '24 18:11 ptitSeb

Just tried it again to be sure, didn't help.

Coreforge avatar Nov 06 '24 20:11 Coreforge

I just did a quick test with 067cf75c, and it works on that version, so this seems to be a regression. The bad commit seems to be a450b9d12f99df13adcae4a570aeb780964d43b2. Reverting that commit fixes it.

Coreforge avatar Nov 12 '24 19:11 Coreforge

@Coreforge is this still an issue with the latest Box64 and Wine/Proton?

LukeShortCloud avatar Nov 13 '25 04:11 LukeShortCloud

Runs fine with on the current version of Box64 and VRChat. I didn't have the most up to date Proton (was on Proton 9), but I don't think that should affect it much. I also only tried desktop mode, but I don't expect VR to be an issue if desktop works correctly.

Just for completeness' sake, options I used were

BOX64_DYNAREC_STRONGMEM=1
BOX64_DYNAREC_BIGBLOCK=3
BOX64_DYNAREC_SAFEFLAGS=0
BOX64_DYNAREC_CALLRET=1

I'm also only running it in offline testing mode, as EAC does not work (which isn't surprising though).

Coreforge avatar Nov 13 '25 16:11 Coreforge