node icon indicating copy to clipboard operation
node copied to clipboard

Maglev on x64 causes segmentation fault while running TypeScript

Open remcohaszing opened this issue 1 year ago • 31 comments

Version

v22.0.0

Platform

Linux vali 6.8.0-76060800daily20240311-generic #202403110203~1714077665~22.04~4c8e9a0 SMP PREEMPT_DYNAMIC Thu A x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

On Linux using Node.js 22:

git clone [email protected]:remcohaszing/typescript-bug-58369.git
cd typescript-bug-58369
npm ci
tsc

See also this failed GitHub action: https://github.com/remcohaszing/typescript-bug-58369/actions/runs/8899456400/job/24438867767

How often does it reproduce? Is there a required condition?

For this reproduction it’s reproduced consistently on Linux on both my machine and GitHub actions.

While troubleshooting by trimming down the content of node_modules/@types/mdast/index.d.ts, I got into a state where it seemed to happen randomly. The major factor is the 👉 emoji in a comment.

The error did not occur on macOS in the GitHub action, but it did happen consistently for @wooorm on their macbook.

The problem was not reproducible on Windows.

What is the expected behavior? Why is that the expected behavior?

No segmentation fault

What do you see instead?

Segmentation fault (core dumped)

Additional information

This was originally reported to TypeScript: https://github.com/microsoft/TypeScript/issues/58369. This issue contains more information.

This has coincidentally already been fixed for the upcoming TypeScript 5.5. Still, a segfault should not occur.

We were unable to make a smaller reproduction.

remcohaszing avatar May 02 '24 12:05 remcohaszing

Hi! Could you possibly provide some example code to reproduce this? Preferably code that has been compiled into plain JS.

avivkeller avatar May 02 '24 13:05 avivkeller

I would absolutely love to. Unfortunately the bug is reproduced by running tsc. Neither I nor the TypeScript team have been able to make a small reproduction.

It was caused by https://github.com/microsoft/TypeScript/pull/53081 and fixed by https://github.com/microsoft/TypeScript/pull/58339 (unreleased yet).

remcohaszing avatar May 02 '24 13:05 remcohaszing

AFAICT this doesn't seem like an issue with Node.js itself, but rather a compiler (such as tsc), but I'm no expert, and I'd love to get a second opinion.

avivkeller avatar May 02 '24 13:05 avivkeller

Sorry, I now see I forgot to provide a critical piece of information. This is a regression in Node.js 22.0.0. It wasn’t a problem before.

remcohaszing avatar May 02 '24 13:05 remcohaszing

Ahh, okay, thank you.

avivkeller avatar May 02 '24 13:05 avivkeller

@targos I have a feel we rushed the V8 upgrades.

cc @nodejs/v8 @RafaelGSS

mcollina avatar May 02 '24 13:05 mcollina

It's probably related to V8, but I'm not sure waiting would have changed anything? We released v22.0.0 with the version of V8 that's in current Chrome.

targos avatar May 02 '24 15:05 targos

Seems specific to Linux or x64 as I cannot reproduce on ARM64 macOS.

targos avatar May 02 '24 15:05 targos

We also don't know which version of V8 introduced the bug (assuming it's in V8).

targos avatar May 02 '24 15:05 targos

So, it's specific to x64. I can reproduce with node-v22.0.0-darwin-x64 on Rosetta.

targos avatar May 02 '24 15:05 targos

I'm going to compile a debug build on one of the Hetzner machines to get a meaningful stack trace.

targos avatar May 02 '24 15:05 targos

I’m on macOS and repro it consistently btw

wooorm avatar May 02 '24 15:05 wooorm

@woorm ARM or Intel?

targos avatar May 02 '24 15:05 targos

It's probably related to V8,

The code that started/stopped crashing in TS had do to with indexing into strings. The string that TS was looking into starts/stops crashing when there is/isn’t an emoji.

One TS maintainer potentially saw the crash appear/disappear when adding a console.log somewhere. So it may be related to some optimization routine

wooorm avatar May 02 '24 15:05 wooorm

I’m on an 2.6 GHz 6-Core Intel Core i7, Sonoma 14.4.1 (23E224). I have a bunch of the GNU utils tho. So perhaps there could be something that doesn’t happen on mac normally, but my machine looks more like Linux.

wooorm avatar May 02 '24 15:05 wooorm

Ignore the repro-exists tag, I didn't mean to add it, and it won't effect anything.

avivkeller avatar May 02 '24 15:05 avivkeller

Just to give a side by side using https://github.com/remcohaszing/typescript-bug-58369:

$ grep -A8 'function scanJSDocCommentTextToken' ./node_modules/typescript/lib/tsc.js
  function scanJSDocCommentTextToken(inBackticks) {
    fullStartPos = tokenStart = pos;
    tokenFlags = 0 /* None */;
    if (pos >= end) {
      return token = 1 /* EndOfFileToken */;
    }
    for (let ch = text.charCodeAt(pos); pos < end && (!isLineBreak(ch) && ch !== 96 /* backtick */); ch = codePointAt(text, ++pos)) {
      if (!inBackticks) {
        if (ch === 123 /* openBrace */) {
$ node ./node_modules/typescript/lib/tsc.js
[1]    1090384 segmentation fault  node ./node_modules/typescript/lib/tsc.js

Now, add debugger to the loop (console.log works too but is loud):

$ grep -A8 'function scanJSDocCommentTextToken' ./node_modules/typescript/lib/tsc.js
  function scanJSDocCommentTextToken(inBackticks) {
    fullStartPos = tokenStart = pos;
    tokenFlags = 0 /* None */;
    if (pos >= end) {
      return token = 1 /* EndOfFileToken */;
    }
    for (let ch = text.charCodeAt(pos); pos < end && (!isLineBreak(ch) && ch !== 96 /* backtick */); ch = codePointAt(text, ++pos)) {
      debugger; // ADDED
      if (!inBackticks) {
$ node ./node_modules/typescript/lib/tsc.js

I have not been able to extract out a test which just calls the parser via the public API, nor by extracting this code and giving it the same inputs.

jakebailey avatar May 02 '24 16:05 jakebailey

Weird, thanks for the information!

avivkeller avatar May 02 '24 16:05 avivkeller

(gdb) run node_modules/typescript/lib/tsc.js
Starting program: /home/iojs/tmp-targos/node/out/Debug/node node_modules/typescript/lib/tsc.js
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7a4f640 (LWP 18826)]
[New Thread 0x7ffff724e640 (LWP 18827)]
[New Thread 0x7ffff6a4d640 (LWP 18828)]
[New Thread 0x7ffff624c640 (LWP 18829)]
[New Thread 0x7ffff5a4b640 (LWP 18830)]
[New Thread 0x7ffff51a9640 (LWP 18831)]

Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x00005554d878345d in ?? ()
(gdb) bt
#0  0x00005554d878345d in ?? ()
#1  0x00001967bd580c69 in ?? ()
#2  0x000000000000200e in ?? ()
#3  0x0000200e00000000 in ?? ()
#4  0x0000000000000000 in ?? ()

Perfect 🙃

targos avatar May 02 '24 16:05 targos

Just to note it, you can also add noop(); instead of debugger; if you want a debugger-statement free crasher.

jakebailey avatar May 02 '24 16:05 jakebailey

This is due to the Maglev compiler. I confirm that ./configure --v8-disable-maglev fixes it.

Maglev was enabled in https://github.com/nodejs/node/pull/51360

/cc @kvakil

targos avatar May 02 '24 18:05 targos

/cc @victorgomes

targos avatar May 02 '24 18:05 targos

I'm rebuilding with ASan and GCC stack protection flags to see if it helps to pinpoint the issue

targos avatar May 03 '24 08:05 targos

GCC failed to build so I switched to clang. Here's a bit more information:

(lldb) run node_modules/typescript/lib/tsc.js
Process 107011 launched: '/home/iojs/tmp-targos/node/out/Debug/node' (x86_64)
Process 107011 stopped
* thread #1, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0xd848)
    frame #0: 0x00005554dd0c345d
->  0x5554dd0c345d: movl   0xb(%rbx), %r14d
    0x5554dd0c3461: incl   %r12d
    0x5554dd0c3464: cmpl   %r14d, %r12d
    0x5554dd0c3467: jge    0x5554dd0c3497
(lldb) bt
* thread #1, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0xd848)
  * frame #0: 0x00005554dd0c345d
    frame #1: 0x00005554dd07990f
    frame #2: 0x00005554dd078120
    frame #3: 0x00005554dd08f431
    frame #4: 0x00005554dd08ede4
    frame #5: 0x00005554dd08ff1a
    frame #6: 0x00005554dd090015
    frame #7: 0x00005554dd090236
    frame #8: 0x00005554dd0905bf
    frame #9: 0x00005554dd08c36a
    frame #10: 0x00005554dd08538e
    frame #11: 0x00005554fcdd0d5e
    frame #12: 0x00005554dd08fa47
    frame #13: 0x00005554dd0860d4
    frame #14: 0x00005554dd087c7b
    frame #15: 0x00005554dd0670dd
    frame #16: 0x00005554dd08569f
    frame #17: 0x00005554fcdd0d5e
    frame #18: 0x00005554fcdd0d5e
    frame #19: 0x00005554fcdd0d5e
    frame #20: 0x00005554fcdd0d5e
    frame #21: 0x00005554fcdd0d5e
    frame #22: 0x00005554fcdd0d5e
    frame #23: 0x00005554fcdd0d5e
    frame #24: 0x00005554fcdd0d5e
    frame #25: 0x00005554fcdd0d5e
    frame #26: 0x00005554fcdd0d5e
    frame #27: 0x00005554fcdd0d5e
    frame #28: 0x00005554fcdd0d5e
    frame #29: 0x00005554fcdd0d5e
    frame #30: 0x00005554fcdd0d5e
    frame #31: 0x00005554dd04b3d3
    frame #32: 0x00005554fcdd0d5e
    frame #33: 0x00005554fcdd0d5e
    frame #34: 0x00005554fcdd0d5e
    frame #35: 0x00005554fcdd0d5e
    frame #36: 0x00005554fcdd0d5e
    frame #37: 0x00005554fcdd0d5e
    frame #38: 0x00005554fcdd0d5e
    frame #39: 0x00005554fcdd0d5e
    frame #40: 0x00005554fcdd0d5e
    frame #41: 0x00005554fcdd0d5e
    frame #42: 0x00005554fcdd0d5e
    frame #43: 0x00005554fcdce05c
    frame #44: 0x00005554fcdcdd83
    frame #45: 0x0000555559b6a5dc node`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [inlined] v8::internal::GeneratedCode<unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long**>::Call(this=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>) at simulator.h:178:12
    frame #46: 0x0000555559b6a5d9 node`v8::internal::(anonymous namespace)::Invoke(isolate=<unavailable>, params=<unavailable>)::InvokeParams const&) at execution.cc:418:22
    frame #47: 0x0000555559b68766 node`v8::internal::Execution::Call(isolate=<unavailable>, callable=<unavailable>, receiver=<unavailable>, argc=<unavailable>, argv=<unavailable>) at execution.cc:504:10
    frame #48: 0x00005555593dc7d7 node`v8::Function::Call(this=<unavailable>, context=<unavailable>, recv=<unavailable>, argc=<unavailable>, argv=<unavailable>) at api.cc:5485:7
    frame #49: 0x000055555857e144 node`node::builtins::BuiltinLoader::CompileAndCall(this=0x000061f0000027a0, context=Local<v8::Context> @ 0x00007fffffffc360, id="internal/main/run_main_module", argc=4, argv=0x0000603000060d30, optional_realm=0x0000618000000080) at node_builtins.cc:490:14
    frame #50: 0x000055555857db08 node`node::builtins::BuiltinLoader::CompileAndCall(this=0x000061f0000027a0, context=Local<v8::Context> @ 0x00007fffffffc620, id="internal/main/run_main_module", realm=0x0000618000000080) at node_builtins.cc:474:10
    frame #51: 0x0000555558a1d04a node`node::Realm::ExecuteBootstrapper(this=0x0000618000000080, id="internal/main/run_main_module") at node_realm.cc:161:32
    frame #52: 0x00005555584af033 node`node::StartExecution(env=0x000061f000001c80, main_script_id="internal/main/run_main_module") at node.cc:237:35
    frame #53: 0x00005555584aeb92 node`node::StartExecution(env=0x000061f000001c80, cb=node::StartExecutionCallback @ 0x00007fffffffd900)>) at node.cc:365:12
    frame #54: 0x0000555558083dd6 node`node::LoadEnvironment(env=0x000061f000001c80, cb=node::StartExecutionCallback @ 0x00007fffffffdb80, preload=node::EmbedderPreloadCallback @ 0x00007fffffffdbc0)>, std::function<void (node::Environment*, v8::Local<v8::Value>, v8::Local<v8::Value>)>) at environment.cc:551:10
    frame #55: 0x000055555886bd56 node`node::NodeMainInstance::Run(this=0x00007fffffffe170, exit_code=0x00007fffffffde80, env=0x000061f000001c80) at node_main_instance.cc:120:7
    frame #56: 0x000055555886adc0 node`node::NodeMainInstance::Run(this=0x00007fffffffe170) at node_main_instance.cc:100:3
    frame #57: 0x00005555584b60a9 node`node::StartInternal(argc=2, argv=0x000060b000002da0) at node.cc:1445:24
    frame #58: 0x00005555584b5766 node`node::Start(argc=2, argv=0x00007fffffffe608) at node.cc:1452:27
    frame #59: 0x000055555d969352 node`main(argc=2, argv=0x00007fffffffe608) at node_main.cc:97:10
    frame #60: 0x00007ffff7a75d90 libc.so.6`__libc_start_call_main(main=(node`main at node_main.cc:96), argc=2, argv=0x00007fffffffe608) at libc_start_call_main.h:58:16
    frame #61: 0x00007ffff7a75e40 libc.so.6`__libc_start_main_impl(main=(node`main at node_main.cc:96), argc=2, argv=0x00007fffffffe608, init=0x00007ffff7ffd040, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffe5f8) at libc-start.c:392:3
    frame #62: 0x0000555557ed2a35 node`_start + 37
(lldb) register read
General Purpose Registers:
       rax = 0x00007ea6937000d9
       rbx = 0x000000000000d83d
       rcx = 0x000000000000000a
       rdx = 0x000000000000000d
       rdi = 0x00007ee68b891811
       rsi = 0x0000000000002028
       rbp = 0x00007fffffff94f0
       rsp = 0x00007fffffff94a8
        r8 = 0x0000000000000060
        r9 = 0x000000000000007b
       r10 = 0x0000000000000000
       r11 = 0x0000000000000040
       r12 = 0x000000000000200f
       r13 = 0x0000631000015080
       r14 = 0x000000000000d800
       r15 = 0x0000200e00000000
       rip = 0x00005554dd0c345d
    rflags = 0x0000000000010246
        cs = 0x0000000000000033
        fs = 0x0000000000000000
        gs = 0x0000000000000000
        ss = 0x000000000000002b
        ds = 0x0000000000000000
        es = 0x0000000000000000

I don't know what else to do at this point. Happy to run more commands if you have any idea.

targos avatar May 03 '24 09:05 targos

--no-maglev-inlining also fixes it.

targos avatar May 03 '24 09:05 targos

I also tried the repro with:

  • https://nodejs.org/download/v8-canary/v23.0.0-v8-canary202405011e593b294c/ (V8 12.6.144)
  • https://nodejs.org/download/nightly/v22.0.0-nightly2024042244f81a1b7d/ (V8 12.3.219.16)
  • https://nodejs.org/download/nightly/v22.0.0-nightly2024041907f481cfcf/ (V8 12.2.281.27)
  • https://nodejs.org/download/nightly/v22.0.0-nightly20240331021cf91208/ (V8 11.9.169.7)

They all segfault so I don't think it's a V8 regression, but really a Maglev inlining bug.

targos avatar May 03 '24 09:05 targos

I submitted a V8 bug report: https://issues.chromium.org/issues/338535750

targos avatar May 03 '24 10:05 targos

It looks very similar to https://issues.chromium.org/issues/42204637

targos avatar May 03 '24 11:05 targos

Well, there is a v8 option called --print-opt-source which can print source code of optimized and inlined functions. The problem occurs when v8 optimizes the isLineBreak js function in tsc.js. I changed the function to this and the problem is gone.

// tsc.js
function isLineBreak(ch) {
  return ch === 10 /* lineFeed */ || ch === 13 /* carriageReturn */ ;
}
gdb --args ~/tannalwork/projects/node/node_g --print-opt-source ./node_modules/typescript/lib/tsc.js

--- FUNCTION SOURCE (/home/tannal/tannalwork/projects/node/out/typescript-bug-58369/node_modules/.pnpm/[email protected]/node_modules/typescript/lib/tsc.js:isLineBreak) id{16,-1} start{744101} ---
(ch) {
  return ch === 10 /* lineFeed */ || ch === 13 /* carriageReturn */ || ch === 8232 /* lineSeparator */ || ch === 8233 /* paragraphSeparator */;
}
--- END ---

Thread 1 "node_g" received signal SIGSEGV, Segmentation fault.
0x00005554dbb1115d in ?? ()

tannal avatar May 05 '24 04:05 tannal

@targos Could you try to run the test with maglev and with pointer compression enabled?

victorgomes avatar May 06 '24 14:05 victorgomes