node
node copied to clipboard
SIGSEGV when compiled with -O2 or -O3 on gcc 13.1.1
Version
20.1.0
Platform
Linux nodejs 6.2.14-300.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Mon May 1 00:55:28 UTC 2023 x86_64 GNU/Linux
Subsystem
v8
What steps will reproduce the bug?
- Compile Node.js with GCC 13.1.1 and -O2 or higher optimization*. I compiled Node with
--shared
for Fedora, hence why the backtrace below mentionslibnode.so.115
. - Launch
node
. - Try to do anything, within a few seconds, one of the v8 threads will crash.
From GDB:
Thread 3 "node" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff2ffe6c0 (LWP 1565850)]
0x00007ffff57db195 in v8::internal::compiler::SpecialRPONumberer::ComputeAndInsertSpecialRPO(v8::internal::compiler::BasicBlock*, v8::internal::compiler::BasicBlock*) () from /lib64/libnode.so.115
[*] I did some extensive narrowing down and was able to determine that -O1 -fipa-icf -fstrict-aliasing
is what triggers it. Building with -O2 -fno-ipa-icf -fno-strict-aliasing
avoids the segfault.
How often does it reproduce? Is there a required condition?
Reproduces every time.
- Compile Node.js with GCC 13.1.1 and -O2 or higher optimization*. I compiled Node with
--shared
for Fedora, hence why the backtrace below mentionslibnode.so.115
.
[*] See above.
What is the expected behavior? Why is that the expected behavior?
It shouldn't crash.
What do you see instead?
$ node-20
Welcome to Node.js v20.1.0.
Type ".help" for more information.
> req [1] 571761 segmentation fault (core dumped) node-20
Additional information
I'm not certain if the issue is due to an code-generation bug in GCC or a problem with Node.js.
Correction, I have narrowed it down to -fno-ipa-icf
being sufficient at -O2 and -O3 to avoid the segfault. -fno-strict-aliasing
was a mistake (flawed test).
Thanks for the report, Stephen.
It looks like a V8 issue and V8 isn't -fstrict-aliasing
clean. I can't really think of a reason why -fipa-icf
(identical code folding) would trigger a crash though. Could this be a gcc bug?
If you want, you can send a pull request that adds -fno-ipa-icf
to cflags in configure.py. The logic would look something like this:
if gcc_version_ge((13,1,1)):
o['cflags'] += ['-fno-ipa-icf']
I can't reproduce on Debian with gcc-13.1.0:
$ gcc-13 --version
gcc-13 (Debian 13.1.0-3) 13.1.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ CC=gcc-13 CXX=g++-13 ./configure --ninja --shared
$ make node
$ ldd out/Release/node
linux-vdso.so.1 (0x00007ffcca13b000)
libnode.so.115 => /node/out/Release/lib/libnode.so.115 (0x00007fd1d7a00000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd1d781f000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fd1d75c9000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd1d74ea000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd1dd850000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd1dd882000)
$ /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Debian GLIBC 2.36-9) stable release version 2.36.
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 12.2.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
$ CC=gcc-13 CXX=g++-13 make jstest
# mostly passes
Also can't reproduce with Arch and gcc-13.1.1:
$ gcc --version
gcc (GCC) 13.1.1 20230429
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ./configure --ninja --shared
$ make node
$ ldd out/Release/node
linux-vdso.so.1 (0x00007ffc9a5a6000)
libnode.so.115 => /node/out/Release/lib/libnode.so.115 (0x00007fef7ce00000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fef7cb85000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007fef82d84000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fef7cb60000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007fef7c976000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fef82e81000)
$ /usr/lib/libc.so.6
GNU C Library (GNU libc) stable release version 2.37.
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 13.1.1 20230429.
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 4.4.0
For bug reporting instructions, please see:
<https://bugs.archlinux.org/>.
$ uname -a
Linux 2a18fc7dfbd4 5.19.0-1024-aws #25~22.04.1-Ubuntu SMP Tue Apr 18 23:41:58 UTC 2023 x86_64 GNU/Linux
$ make jstest
# mostly passes
Whatever is failing here appears to be happening after several seconds in a non-main thread. My suspicion is that the tests aren't active long enough for whatever that other thread is to hit the bug. I can reproduce this after several seconds of running in the node shell environment (running node
without arguments).
I tried to run ./node
compiled on Fedora 38 with default configure options and it didn't fail.
I just rebuilt the Fedora RPM of 20.2.0 with -fipa-icf
and reproduced the error again:
(gdb) bt full
#0 0x00007f32dfc9f545 in v8::internal::compiler::BasicBlock::SuccessorAt () at ../../deps/v8/src/compiler/schedule.h:85
No locals.
#1 v8::internal::compiler::SpecialRPONumberer::ComputeLoopInfo () at ../../deps/v8/src/compiler/scheduler.cc:990
No locals.
#2 v8::internal::compiler::SpecialRPONumberer::ComputeAndInsertSpecialRPO (this=0x7f32d001a218, entry=0x7f32d0035618, end=0x7f3200000000) at ../../deps/v8/src/compiler/scheduler.cc:832
loop = 0x7f32d001aa60
order = <optimized out>
stack_depth = <optimized out>
num_loops = <optimized out>
current_loop = 0x0
current_header = 0x1
loop_depth = -805196912
#3 0x00007f32dfcb4ef3 in v8::internal::compiler::Scheduler::ComputeSchedule (zone=0x55d624bef270, graph=0x55d624c1a688, flags=..., tick_counter=0x7f32d001ab90, profile_data=0x7f32d0037280) at ../../deps/v8/src/compiler/scheduler.cc:65
schedule_zone = <optimized out>
node_count_hint = <optimized out>
schedule = <optimized out>
scheduler = {zone_ = 0x55d624bef270, graph_ = 0x55d624c1a688, schedule_ = 0x7f32d00355a0, flags_ = {mask_ = 2}, scheduled_nodes_ = {zone_ = 0x55d624bef270, data_ = 0x7f32d0019e40, end_ = 0x7f32d001a1c0, capacity_ = 0x7f32d001a218}, schedule_root_nodes_ = {zone_ = 0x55d624bef270, data_ = 0x0, end_ = 0x0, capacity_ = 0x0}, schedule_queue_ = {<No data fields>}, node_data_ = {zone_ = 0x55d624bef270,
data_ = 0x7f32d0014ab8, end_ = 0x7f32d0016e68, capacity_ = 0x7f32d00171f8}, control_flow_builder_ = 0x7f32d0018408, special_rpo_ = 0x7f32d001a218, equivalence_ = 0x7f32d00171f8, tick_counter_ = 0x55d624bec048, profile_data_ = 0x0, common_dominator_cache_ = {<No data fields>}}
#4 0x00007f32dfc8d712 in v8::internal::compiler::ComputeSchedulePhase::Run () at ../../deps/v8/src/compiler/pipeline.cc:2342
No locals.
#5 v8::internal::compiler::PipelineImpl::Run<v8::internal::compiler::ComputeSchedulePhase> () at ../../deps/v8/src/compiler/pipeline.cc:1367
No locals.
#6 v8::internal::compiler::PipelineImpl::ComputeScheduledGraph (this=0x55d624bec070) at ../../deps/v8/src/compiler/pipeline.cc:3783
data = 0x55d624bec070
#7 0x00007f32dfc9405c in v8::internal::compiler::PipelineImpl::OptimizeGraph (this=0x55d624bec2f0, linkage=0x55d624c19568) at ../../deps/v8/src/compiler/pipeline.cc:3026
data = <optimized out>
#8 0x00007f32dfc9550f in v8::internal::compiler::PipelineCompilationJob::ExecuteJobImpl (this=0x55d624bebee0, stats=0x55d624bebfa8, local_isolate=0x55d624bec2f0) at ../../deps/v8/src/compiler/pipeline.cc:1299
scope = <optimized out>
local_isolate_scope = <optimized out>
#9 0x00007f32e0b9ffef in v8::internal::OptimizedCompilationJob::ExecuteJob () at ../../deps/v8/src/codegen/compiler.cc:496
No locals.
#10 0x00007f32e0bd3aa5 in v8::internal::OptimizingCompileDispatcher::CompileNext () at ../../deps/v8/src/compiler-dispatcher/optimizing-compile-dispatcher.cc:105
No locals.
#11 0x00007f32e0bd3e44 in v8::internal::OptimizingCompileDispatcher::CompileTask::RunInternal () at ../../deps/v8/src/compiler-dispatcher/optimizing-compile-dispatcher.cc:67
No locals.
#12 0x00007f32df5e7b4e in PlatformWorkerThread (data=0x55d624a6f200) at ../../src/node_platform.cc:43
task = <optimized out>
pending_worker_tasks = 0x55d624a6ea00
#13 0x00007f32de952907 in start_thread (arg=<optimized out>) at pthread_create.c:444
ret = <optimized out>
pd = <optimized out>
out = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139856459408912, 1118916026530479130, 139856408622784, -312, 2, 140733863298256, 1118916026543062042, 1118926824340536346}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#14 0x00007f32de9d8870 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
I'm attaching the full build logs as well. build-20.2.0-2.fc38.log.gz
Whatever is failing here appears to be happening after several seconds in a non-main thread. My suspicion is that the tests aren't active long enough for whatever that other thread is to hit the bug. I can reproduce this after several seconds of running in the node shell environment (running
node
without arguments).
yeah, I tried that & still couldn't reproduce.
I tried fedora on docker and still couldn't reproduce:
$ cd /node
$ dnf install -y gcc-c++ make python2 git
$ git rev-parse HEAD
ad2c05b671aff71259afbf23de32b6f177c3ba61
$ cat /etc/fedora-release
Fedora release 38 (Thirty Eight)
$ gcc --version
gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ /lib64/libc.so.6
GNU C Library (GNU libc) stable release version 2.37.
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 13.0.1 20230401 (Red Hat 13.0.1-0).
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<https://www.gnu.org/software/libc/bugs.html>.
$ ./configure && make -j`nproc`
$ ./node
# works fine, waited a couple of minutes, tried executing things, etc.
@sgallagher The full configure arguments including any CFLAGS etc exported in the environment are likely needed.
@sgallagher The full configure arguments including any CFLAGS etc exported in the environment are likely needed.
Which are all included in the logs I attached.
Sorry, I missed that!
Definitely can reproduce the issue,.
Relevant flags
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -pipe -fno-stack-protector -O3 -fuse-ld=mold"
CHOST="x86_64-pc-linux-gnu"
CXXFLAGS="-march=native -pipe -fno-stack-protector -O3 -fuse-ld=mold"
LDFLAGS="-fuse-ld=mold -Wl,-O1 -Wl,--as-needed"
Configure flags
GYP_DEFINES="linux_use_gold_flags=0 linux_use_bundled_binutils=0 linux_use_bundled_gold=0"./configure.py --dest-cpu=x64 --ninja --shared-brotli --shared-cares --shared-libuv --shared-nghttp2 --shared-zlib --enable-lto --with-intl=system-icu --without-inspector --without-npm --without-node-snapshot --shared-openssl --openssl-use-def-ca-store