rr
rr copied to clipboard
Ryzen 3900X test failures
Testing rr on Ryzen 3900X (after running the ryzen workaround) and I get the following failures:
The following tests FAILED:
565 - setuid-no-syscallbuf (Failed)
1102 - checksum_sanity_noclone (Failed)
The following tests FAILED:
1153 - record_replay-no-syscallbuf (Failed)
2394 - record_replay-32 (Failed)
I can reproduce setuid-no-syscallbuf
with ctest -R
sometimes, I cannot reproduce the other failures.
I'm not sure if 3900X should be added to the list of supported Ryzen CPUs, are these known bits of flakiness?
cc @glandium
Can you check what the .err files in /tmp/rr-test-*
say?
I can reproduce setuid-no-syscallbuf with ctest -R sometimes, I cannot reproduce the other failures.
You mean the setuid-no-syscallbuf test fails often, but the other tests fail very rarely?
Ryzen 3700x After following the setup instructions, I get 12 failures out of 2487 tests. Which is still great compared to before.
Summary
82 - clone_vfork_pidfd (Failed)
83 - clone_vfork_pidfd-no-syscallbuf (Failed)
920 - nested_detach_wait (Failed)
921 - nested_detach_wait-no-syscallbuf (Failed)
1140 - nested_detach (Failed)
1141 - nested_detach-no-syscallbuf (Failed)
1326 - clone_vfork_pidfd-32 (Failed)
1327 - clone_vfork_pidfd-32-no-syscallbuf (Failed)
2162 - nested_detach_wait-32 (Failed)
2163 - nested_detach_wait-32-no-syscallbuf (Failed)
2382 - nested_detach-32 (Failed)
2383 - nested_detach-32-no-syscallbuf (Failed)
These are the .err files of all the tests, I can provide the rest of files, but the tar would be too big to provide all at once. rr-tests.tar.gz
@v-lopez: Please file a separate issue for those. The clone_vfork_pidfd has a similar problem to what was fixed in 17aa8239c0a9ffd0e66623fc3627f664b384bf1e, and nested-detach has a different kind of assertion.
@Manishearth did setuid fail with something like the following?
[FATAL .../rr/src/Registers.cc:405:compare_register_files()]
(task 911147 (rec:857972) at time 365)
-> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:128273/128273)
On my end, with a 3990X, the setuid-no-syscallbuf test is failing (and that's the main one that I think I've seen fail with some repeated runs, though sometimes it doesn't fail), and the record.err says this:
[ERROR /home/pnkfelix/Dev/Mozilla/rr.git/src/Registers.cc:295:maybe_print_reg_mismatch()] r10 0x55a993c2e95a != 0x55a993c2e958 (replaying vs. recorded)
process 317197 sent SIGURG
For full log, click here
% cat /tmp/rr-test-setuid-TgOCW9j3p/replay.err
[ERROR /home/pnkfelix/Dev/Mozilla/rr.git/src/Registers.cc:295:maybe_print_reg_mismatch()] r10 0x55a993c2e95a != 0x55a993c2e958 (replaying vs. recorded)
process 317197 sent SIGURG
====== /proc/317197/status
Name: rr
Umask: 0002
State: S (sleeping)
Tgid: 317197
Ngid: 0
Pid: 317197
PPid: 317196
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 64
Groups: 4 27 1000
NStgid: 317197
NSpid: 317197
NSpgid: 317197
NSsid: 4178
VmPeak: 16508 kB
VmSize: 15468 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 10088 kB
VmRSS: 9680 kB
RssAnon: 1116 kB
RssFile: 8564 kB
RssShmem: 0 kB
VmData: 1156 kB
VmStk: 136 kB
VmExe: 5728 kB
VmLib: 1320 kB
VmPTE: 68 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
Threads: 1
SigQ: 1/1030150
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000180002000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: 00000100,00000000,00000000,00000000
Cpus_allowed_list: 104
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 2
nonvoluntary_ctxt_switches: 335
====== /proc/317197/stack
====== /proc/317198/status
Name: rr:setuid-TgOCW
Umask: 0002
State: t (tracing stop)
Tgid: 317198
Ngid: 0
Pid: 317198
PPid: 317197
TracerPid: 317197
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 1024
Groups: 4 27 1000
NStgid: 317198
NSpid: 317198
NSpgid: 317198
NSsid: 317198
VmPeak: 5212 kB
VmSize: 5088 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 2184 kB
VmRSS: 2184 kB
RssAnon: 380 kB
RssFile: 1804 kB
RssShmem: 0 kB
VmData: 2460 kB
VmStk: 0 kB
VmExe: 8 kB
VmLib: 1920 kB
VmPTE: 56 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
Threads: 1
SigQ: 1/1030150
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000010000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 1
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: 00000100,00000000,00000000,00000000
Cpus_allowed_list: 104
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 332
nonvoluntary_ctxt_switches: 0
====== /proc/317198/stack
====== gdb -p 317197 -ex 'set confirm off' -ex 'set height 0' -ex 'thread apply all bt' -ex q </dev/null 2>&1
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 317197
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
I would expect that to be a duplicate of #2694. If you can pack and upload a trace I can verify whether or not the tracee is using RDRAND.
I assume the trace you want packed is the one in the same /tmp/rr-test-setuid-XXX
directory; I've put a tarball of that whole directory below.
~~rr-test-setuid.tar.gz~~ (this wasn't what you asked for; see below.)
Oh, I'm sorry, you asked me to pack it, and I didn't realized that meant run rr pack
on it as described in #2694. I'll do that now.
Unsupported instruction at 0x7f534449603f (opcode rdrand)
Can you replay the trace, hbreak *0x7f534449603f
in gdb. continue, and get a backtrace at that instruction?
Click for Backtrace
% ./bin/rr replay /tmp/rr-test-setuid-TgOCW9j3p/latest-trace/
On Zen CPUs, rr will not work reliably unless you disable the hardware SpecLockMap optimization.
For instructions on how to do this, see https://github.com/mozilla/rr/wiki/Zen
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /tmp/rr-test-setuid-TgOCW9j3p/setuid-TgOCW9j3p-0/mmap_pack_5_setuid-TgOCW9j3p...
Really redefine built-in command "restart"? (y or n) [answered Y; input not from terminal]
Remote debugging using 127.0.0.1:50382
Reading symbols from /lib64/ld-linux-x86-64.so.2...
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
0x00007f534470d100 in ?? () from /lib64/ld-linux-x86-64.so.2
(rr) hbreak *0x7f534449603f
Hardware assisted breakpoint 1 at 0x7f534449603f
(rr) continue
Continuing.
Breakpoint 1, 0x00007f534449603f in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
(rr) bt
#0 0x00007f534449603f in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#1 0x00007f5344496273 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#2 0x00007f5344496541 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#3 0x00007f5344484b11 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#4 0x00007f534448ab1e in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#5 0x00007f534448b251 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#6 0x00007f5344498981 in _nss_systemd_getgrnam_r () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#7 0x00007f53445a967d in __getgrnam_r (name=name@entry=0x55a992626030 "nobody",
resbuf=resbuf@entry=0x7f53446b5020 <resbuf>, buffer=0x55a993c23fc0 "", buflen=buflen@entry=1024,
result=result@entry=0x7fff06e2c640) at ../nss/getXXbyYY_r.c:315
#8 0x00007f53445a892c in getgrnam (name=0x55a992626030 "nobody") at ../nss/getXXbyYY.c:134
#9 0x000055a99262557e in main (argc=1, argv=0x7fff06e2c7d8)
at /home/pnkfelix/Dev/Mozilla/rr.git/src/test/setuid.c:15
(rr)
So this confirms that my problem is a duplicate of issue #2694, since __getgrnam_r
appears in the backtrace, right?
Yup, it's the same thing in systemd (which is fixed upstream at systemd/systemd#17115)
As this is identified as both an upstream issue (systemd) and duplicate, can we close this?