WebKit
WebKit copied to clipboard
Pacify Valgrind (by fixing a somewhat theoretical bug).
I'm deliberately filing this here rather than in the upstream repo, to hopefully get it some eyeballs and possibly an explanation of why I'm wrong.
I think Valgrind just found a bug in JSC. JSC contains/generates this code:
Dump of assembler code for function op_call_slow_return_location:
0x0000000008b2c2ad <+0>: mov 0x10(%rbp),%rdx
0x0000000008b2c2b1 <+4>: mov 0x14(%rdx),%edx
0x0000000008b2c2b4 <+7>: shl $0x3,%rdx
0x0000000008b2c2b8 <+11>: mov %rbp,%rsp
0x0000000008b2c2bb <+14>: sub %rdx,%rsp
0x0000000008b2c2be <+17>: mov 0x24(%rbp),%r8d
0x0000000008b2c2c2 <+21>: movsbq 0x1(%r13,%r8,1),%rsi
0x0000000008b2c2c8 <+27>: mov %rax,0x0(%rbp,%rsi,8)
0x0000000008b2c2cd <+32>: movzbl 0x5(%r13,%r8,1),%edx
0x0000000008b2c2d3 <+38>: imul $0xfffffffffffffff0,%rdx,%rdx
0x0000000008b2c2d7 <+42>: mov %rax,-0x10(%r12,%rdx,1)
0x0000000008b2c2dc <+47>: add $0x7,%r8
0x0000000008b2c2e0 <+51>: movzbl 0x0(%r13,%r8,1),%eax
0x0000000008b2c2e6 <+57>: lea 0x2396ab3(%rip),%rsi # 0xaec2da0 <g_opcodeMap>
0x0000000008b2c2ed <+64>: jmp *(%rsi,%rax,8)
The intention of the first five instructions is to change the stack pointer to a known value. However, it does so by first moving the base pointer to the stack pointer, then subtracting the new reservation from the stack pointer:
0x0000000008b2c2b8 <+11>: mov %rbp,%rsp
0x0000000008b2c2bb <+14>: sub %rdx,%rsp
If a signal happens between those two instructions, the signal handler will wrongly assume that the stack space below the base pointer is no longer used, and overwrite the stack. This is a highly theoretical possibility, since arithmetic operations are unlikely to be interrupted, but Valgrind caught it nonetheless and assumed, correctly, that the values in the intersection of the old and the new stack reservation could no longer be relied upon.
So how could this happen, given that for an assembly programmer, this is a beginner's mistake? The problem here is that JSC uses its own language which is meant to be something like machine-independent assembler code. The source code in this case is:
subp cfr, t0, sp
(t0 is a temporary register, cfr is the base register, sp is the stack pointer, and the "p" prefix marks this as a pointer instruction)
That looks okay, but pre-APX x86 doesn't have a ternary subtraction instruction! It's translated into the unsafe mov + sub sequence above by Ruby code, which assumes it's performing arithmetic on GP registers which can safely hold temporary intermediate values.
I've fixed this locally by rewriting the "subp" instruction into a safe mov-sub sequence, which in turn gets translated into:
0x0000000008b2c318 <+11>: neg %rdx
0x0000000008b2c31b <+14>: add %rbp,%rdx
0x0000000008b2c31e <+17>: mov %rdx,%rsp
Crisis averted, and after a few Bun fixes, Valgrind no longer complains about uninitialized variables.
Again, this is a highly unlikely race condition to hit, it only affects x86_64 for us, requires a signal to be delivered at precisely the wrong moment, and on Linux there's still the Red Zone which might save us. It might be possible to hit in qemu, though.
It's worth pointing out that hardware interrupts usually do not cause userspace signals to be delivered, and bun receives relatively few signals in any case, so the focus here should be on fixing JSC so we can get clean Valgrind runs.
Here's the WebKit issue, which I decided to file after all:
https://bugs.webkit.org/show_bug.cgi?id=270415