rr icon indicating copy to clipboard operation
rr copied to clipboard

Ryzen 3900X test failures

Open Manishearth opened this issue 3 years ago • 15 comments

Testing rr on Ryzen 3900X (after running the ryzen workaround) and I get the following failures:

The following tests FAILED:
	565 - setuid-no-syscallbuf (Failed)
	1102 - checksum_sanity_noclone (Failed)

The following tests FAILED:
	1153 - record_replay-no-syscallbuf (Failed)
	2394 - record_replay-32 (Failed)

I can reproduce setuid-no-syscallbuf with ctest -R sometimes, I cannot reproduce the other failures.

I'm not sure if 3900X should be added to the list of supported Ryzen CPUs, are these known bits of flakiness?

cc @glandium

Manishearth avatar Sep 18 '20 07:09 Manishearth

Can you check what the .err files in /tmp/rr-test-* say?

glandium avatar Sep 18 '20 07:09 glandium

I can reproduce setuid-no-syscallbuf with ctest -R sometimes, I cannot reproduce the other failures.

You mean the setuid-no-syscallbuf test fails often, but the other tests fail very rarely?

rocallahan avatar Sep 18 '20 07:09 rocallahan

Ryzen 3700x After following the setup instructions, I get 12 failures out of 2487 tests. Which is still great compared to before.

Summary

	 82 - clone_vfork_pidfd (Failed)
	 83 - clone_vfork_pidfd-no-syscallbuf (Failed)
	920 - nested_detach_wait (Failed)
	921 - nested_detach_wait-no-syscallbuf (Failed)
	1140 - nested_detach (Failed)
	1141 - nested_detach-no-syscallbuf (Failed)
	1326 - clone_vfork_pidfd-32 (Failed)
	1327 - clone_vfork_pidfd-32-no-syscallbuf (Failed)
	2162 - nested_detach_wait-32 (Failed)
	2163 - nested_detach_wait-32-no-syscallbuf (Failed)
	2382 - nested_detach-32 (Failed)
	2383 - nested_detach-32-no-syscallbuf (Failed)

full output.txt

These are the .err files of all the tests, I can provide the rest of files, but the tar would be too big to provide all at once. rr-tests.tar.gz

v-lopez avatar Sep 18 '20 07:09 v-lopez

@v-lopez: Please file a separate issue for those. The clone_vfork_pidfd has a similar problem to what was fixed in 17aa8239c0a9ffd0e66623fc3627f664b384bf1e, and nested-detach has a different kind of assertion.

glandium avatar Sep 18 '20 07:09 glandium

@Manishearth did setuid fail with something like the following?

[FATAL .../rr/src/Registers.cc:405:compare_register_files()] 
 (task 911147 (rec:857972) at time 365)
 -> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:128273/128273)

glandium avatar Sep 18 '20 08:09 glandium

On my end, with a 3990X, the setuid-no-syscallbuf test is failing (and that's the main one that I think I've seen fail with some repeated runs, though sometimes it doesn't fail), and the record.err says this:

[ERROR /home/pnkfelix/Dev/Mozilla/rr.git/src/Registers.cc:295:maybe_print_reg_mismatch()] r10 0x55a993c2e95a != 0x55a993c2e958 (replaying vs. recorded)
process 317197 sent SIGURG
For full log, click here
% cat /tmp/rr-test-setuid-TgOCW9j3p/replay.err 
[ERROR /home/pnkfelix/Dev/Mozilla/rr.git/src/Registers.cc:295:maybe_print_reg_mismatch()] r10 0x55a993c2e95a != 0x55a993c2e958 (replaying vs. recorded)
process 317197 sent SIGURG
====== /proc/317197/status
Name:	rr
Umask:	0002
State:	S (sleeping)
Tgid:	317197
Ngid:	0
Pid:	317197
PPid:	317196
TracerPid:	0
Uid:	1000	1000	1000	1000
Gid:	1000	1000	1000	1000
FDSize:	64
Groups:	4 27 1000 
NStgid:	317197
NSpid:	317197
NSpgid:	317197
NSsid:	4178
VmPeak:	   16508 kB
VmSize:	   15468 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	   10088 kB
VmRSS:	    9680 kB
RssAnon:	    1116 kB
RssFile:	    8564 kB
RssShmem:	       0 kB
VmData:	    1156 kB
VmStk:	     136 kB
VmExe:	    5728 kB
VmLib:	    1320 kB
VmPTE:	      68 kB
VmSwap:	       0 kB
HugetlbPages:	       0 kB
CoreDumping:	0
THP_enabled:	1
Threads:	1
SigQ:	1/1030150
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000000000000
SigCgt:	0000000180002000
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000
NoNewPrivs:	0
Seccomp:	0
Speculation_Store_Bypass:	thread vulnerable
Cpus_allowed:	00000100,00000000,00000000,00000000
Cpus_allowed_list:	104
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	2
nonvoluntary_ctxt_switches:	335
====== /proc/317197/stack
====== /proc/317198/status
Name:	rr:setuid-TgOCW
Umask:	0002
State:	t (tracing stop)
Tgid:	317198
Ngid:	0
Pid:	317198
PPid:	317197
TracerPid:	317197
Uid:	1000	1000	1000	1000
Gid:	1000	1000	1000	1000
FDSize:	1024
Groups:	4 27 1000 
NStgid:	317198
NSpid:	317198
NSpgid:	317198
NSsid:	317198
VmPeak:	    5212 kB
VmSize:	    5088 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	    2184 kB
VmRSS:	    2184 kB
RssAnon:	     380 kB
RssFile:	    1804 kB
RssShmem:	       0 kB
VmData:	    2460 kB
VmStk:	       0 kB
VmExe:	       8 kB
VmLib:	    1920 kB
VmPTE:	      56 kB
VmSwap:	       0 kB
HugetlbPages:	       0 kB
CoreDumping:	0
THP_enabled:	1
Threads:	1
SigQ:	1/1030150
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000000010000
SigCgt:	0000000000000000
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000
NoNewPrivs:	1
Seccomp:	0
Speculation_Store_Bypass:	thread vulnerable
Cpus_allowed:	00000100,00000000,00000000,00000000
Cpus_allowed_list:	104
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	332
nonvoluntary_ctxt_switches:	0
====== /proc/317198/stack
====== gdb -p 317197 -ex 'set confirm off' -ex 'set height 0' -ex 'thread apply all bt' -ex q </dev/null 2>&1
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 317197
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.

pnkfelix avatar Sep 29 '20 23:09 pnkfelix

I would expect that to be a duplicate of #2694. If you can pack and upload a trace I can verify whether or not the tracee is using RDRAND.

khuey avatar Sep 30 '20 01:09 khuey

I assume the trace you want packed is the one in the same /tmp/rr-test-setuid-XXX directory; I've put a tarball of that whole directory below.

~~rr-test-setuid.tar.gz~~ (this wasn't what you asked for; see below.)

pnkfelix avatar Sep 30 '20 16:09 pnkfelix

Oh, I'm sorry, you asked me to pack it, and I didn't realized that meant run rr pack on it as described in #2694. I'll do that now.

pnkfelix avatar Sep 30 '20 16:09 pnkfelix

Okay this tar ball has the packed version of the directory.

rr-test-setuid.tar.gz

pnkfelix avatar Sep 30 '20 16:09 pnkfelix

Unsupported instruction at 0x7f534449603f (opcode rdrand)

Can you replay the trace, hbreak *0x7f534449603f in gdb. continue, and get a backtrace at that instruction?

khuey avatar Sep 30 '20 16:09 khuey

Click for Backtrace
% ./bin/rr replay /tmp/rr-test-setuid-TgOCW9j3p/latest-trace/
On Zen CPUs, rr will not work reliably unless you disable the hardware SpecLockMap optimization.
For instructions on how to do this, see https://github.com/mozilla/rr/wiki/Zen
GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /tmp/rr-test-setuid-TgOCW9j3p/setuid-TgOCW9j3p-0/mmap_pack_5_setuid-TgOCW9j3p...
Really redefine built-in command "restart"? (y or n) [answered Y; input not from terminal]
Remote debugging using 127.0.0.1:50382
Reading symbols from /lib64/ld-linux-x86-64.so.2...
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
0x00007f534470d100 in ?? () from /lib64/ld-linux-x86-64.so.2
(rr) hbreak *0x7f534449603f
Hardware assisted breakpoint 1 at 0x7f534449603f
(rr) continue
Continuing.

Breakpoint 1, 0x00007f534449603f in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
(rr) bt
#0  0x00007f534449603f in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#1  0x00007f5344496273 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#2  0x00007f5344496541 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#3  0x00007f5344484b11 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#4  0x00007f534448ab1e in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#5  0x00007f534448b251 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#6  0x00007f5344498981 in _nss_systemd_getgrnam_r () from /lib/x86_64-linux-gnu/libnss_systemd.so.2
#7  0x00007f53445a967d in __getgrnam_r (name=name@entry=0x55a992626030 "nobody", 
    resbuf=resbuf@entry=0x7f53446b5020 <resbuf>, buffer=0x55a993c23fc0 "", buflen=buflen@entry=1024, 
    result=result@entry=0x7fff06e2c640) at ../nss/getXXbyYY_r.c:315
#8  0x00007f53445a892c in getgrnam (name=0x55a992626030 "nobody") at ../nss/getXXbyYY.c:134
#9  0x000055a99262557e in main (argc=1, argv=0x7fff06e2c7d8)
    at /home/pnkfelix/Dev/Mozilla/rr.git/src/test/setuid.c:15
(rr) 

pnkfelix avatar Sep 30 '20 18:09 pnkfelix

So this confirms that my problem is a duplicate of issue #2694, since __getgrnam_r appears in the backtrace, right?

pnkfelix avatar Sep 30 '20 18:09 pnkfelix

Yup, it's the same thing in systemd (which is fixed upstream at systemd/systemd#17115)

khuey avatar Sep 30 '20 18:09 khuey

As this is identified as both an upstream issue (systemd) and duplicate, can we close this?

GitMensch avatar Jul 26 '21 08:07 GitMensch