rr icon indicating copy to clipboard operation
rr copied to clipboard

`step` in `rr replay` sometimes "continues"

Open GitMensch opened this issue 2 years ago • 14 comments

packed recording to reproduce the issue: STR.tar.gz

to reproduce after unpacking:

$ rr replay -o -q -o -ex -o "break STR_:ENTRY_STR" -o -ex -o "continue" ./STR-1
(rr) step  # should go one step, goes to end
abC

Program received signal SIGKILL, Killed.
0x0000000070000002 in syscall_traced ()

(rr) reverse-continue  # goes to BP set above "original place"
(rr) next  # actually goes one "code line" down
(rr) next  # another
(rr) next  # another

STR.cob, also included in the recording directory

GitMensch avatar Sep 27 '23 19:09 GitMensch

That trace doesn't replay on my machine, not sure why. I'm not sure from the description exactly what the problem is. Are you sure it's not a gdb bug?

rocallahan avatar Oct 01 '23 03:10 rocallahan

Are you sure it's not a gdb bug?

I don't know if this is a GDB bug, but when using "plain GDB" (instead of recording) then this works fine, a step is a step, and a next is a next.

I've seen this issue long time ago (brought up with Keno once) but as this is a completely new system (Rocky Linux 9) with up-to-date GDB and every test in rr passes, I gave it another deep-try... to see this issue still being there.

As this works completely fine outside of a recording I'd "blame" rr, not GDB.

That trace doesn't replay on my machine, not sure why.

What do you mean exactly by "doesn't replay"? What can we do to enable you to replay this? Not sure if it would help, but maybe try under Rocky9/CentOS Stream?

Otherwise I guess we may debug the gdbserver communication. (Not sure how, was that something to put into RR_LOG?)

GitMensch avatar Oct 01 '23 03:10 GitMensch

What do you mean exactly by "doesn't replay"?

Ticks mismatch.

rocallahan avatar Oct 01 '23 04:10 rocallahan

Is that a common problem when a record was packed and is to be replayed on another system? Can you debug that?

GitMensch avatar Oct 01 '23 05:10 GitMensch

I'm not sure. I don't really want to debug it right now :-)

rocallahan avatar Oct 01 '23 05:10 rocallahan

So after the release work (congrats for that btw!) and now not being a weekend - could you have a look at debugging that now or drop a note what I can provide to help you with that?

GitMensch avatar Oct 03 '23 16:10 GitMensch

Rechecked with current master: On most current Debian (newer Intel, but I don't think that matters, does it?) everything built, tests all pass and I cannot reproduce that (only tested system-distributed GDB 10).

On Rocky9 (a VM, but that shouldn't matter, should it?) and current master everything builts fine and only the known pkey tests fail. I can still reproduce this "a step behaves like continue" issue there (both with system distributed GDB and GDB 13.2, so it seems that doesn't matter [if Debian didn't add some important patch that isn't in GDB 13.2]).

Should I build GDB 13.2 on Debian and recheck with that? Any other idea?

GitMensch avatar Oct 10 '23 17:10 GitMensch

When I try these STR I get

khuey@zhadum:~/dev/scratch/rr-3642$ ~/dev/rr/obj/bin/rr replay -o -q -o -ex -o "break STR_:ENTRY_STR" -o -ex -o "continue" ./STR-1
Trace XCR0 value 0x2e7 != our XCR0 value 0x602e7; Replay will probably fail because glibc dynamic loader examines XCR0

Reading symbols from /home/khuey/dev/scratch/rr-3642/STR-1/mmap_copy_4_STR...
Really redefine built-in command "restart"? (y or n) [answered Y; input not from terminal]
Really redefine built-in command "jump"? (y or n) [answered Y; input not from terminal]
Breakpoint 1 at 0x40131b: file /tmp/STR.cob, line 6.
Remote debugging using 127.0.0.1:47244
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /home/khuey/.cache/debuginfod_client/9718d3757f00d2366056830aae09698dbd35e32c/debuginfo...
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
0x00007feb2a94eba0 in process_dl_debug (state=0x0, dl_debug=0x0) at ./elf/rtld.c:2626
Download failed: Invalid argument.  Continuing without source file ./elf/./elf/rtld.c.
2626    ./elf/rtld.c: No such file or directory.
Continuing.

Breakpoint 1, STR_ (entry=0) at /tmp/STR.cob:6
warning: Source file is more recent than executable.
6                STRING 'a' 'b'  DELIMITED BY SIZE
(rr) step
Cannot access memory at address 0x7feb2a966b00
(rr)

khuey avatar Nov 08 '23 02:11 khuey

Hm, does continue / or setting the breakpoint to b STR.cob:9 instead work for you?

Trace XCR0 value 0x2e7 != our XCR0 value 0x602e7; Replay will probably fail because glibc dynamic loader examines XCR0

Where does this come from? That looks like the high part of the register value 0x6 was cut, the rest of 0x2e7 is identical... Is there anything we can do about that?

... any idea what I can do myself to provide useful information (RR_LOG?) if you cannot reproduce it?

GitMensch avatar Nov 08 '23 07:11 GitMensch

I just reproduced that with a much simpler C program

alehander92 avatar Dec 07 '23 15:12 alehander92

I just reproduced that with a much simpler C program

🚀 can you please share the C program along with the compile options and the environment (kernel, gcc, binutils, compile options) used?

GitMensch avatar Dec 07 '23 15:12 GitMensch

The program is

#include <stdio.h>
#include <stdlib.h>

void run() {
  printf("faith\n");
  int a = 0;
  printf("%d\n", a);
}

int main() {
  run();
  size_t a = 2;
  size_t k = 3 + a;
}

I use rr 5.7.0 on ubuntu 20.04, setup with nix 2.19.2 gcc is 12.3.0 gdb is 12.1:

nix-shell -p rr

and then if it's called loop_c.c, I build it inside this env with

gcc -O0 -g3 -o loop_c loop_c.c 

and record it with

rr record ./loop_c

when replaying, i do

rr replay
..
b run
c
# when on printf line
step 

instead of going inside or over, this seems to continue to the end of the program

(rr) b run
Breakpoint 1 at 0x40113e: file loop_c.c, line 5.
(rr) c
Continuing.

Breakpoint 1, run () at loop_c.c:5
5         printf("faith\n");
(rr) step
faith
0

Program received signal SIGKILL, Killed.
0x0000000070000002 in ?? ()

if i add a breakpoint on line 7, step continues to it

however i can't reproduce the same with rr 5.5.0 installed from apt on the ubuntu system itself, out of nix env

alehander92 avatar Dec 07 '23 15:12 alehander92

Is the gdb version equal on ubuntu and within nix? fwiw, can't repro with gdb 14.0.91.20231009-git (some random local build I have), gcc 12.3.0, rr ~trunk

log
$ gdb --version
GNU gdb (GDB) 14.0.91.20231009-git
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ gcc --version
gcc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ rr --version
rr version 5.7.0

$ gcc -O0 -g3 -o loop_c loop_c.c && rr ./loop_c && rr replay -- -q
rr: Saving execution to trace directory `/home/dzaima/.local/share/rr/loop_c-4'.
faith
0
Reading symbols from /home/dzaima/.local/share/rr/loop_c-4/mmap_hardlink_4_loop_c...
Really redefine built-in command "restart"? (y or n) [answered Y; input not from terminal]
Really redefine built-in command "jump"? (y or n) [answered Y; input not from terminal]
Remote debugging using 127.0.0.1:5925
Reading symbols from /lib64/ld-linux-x86-64.so.2...
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
0x00007fdf8b2e3290 in ?? () from /lib64/ld-linux-x86-64.so.2
(rr) b run
Breakpoint 1 at 0x55bcd219f175: file loop_c.c, line 5.
(rr) c
Continuing.

Breakpoint 1, run () at loop_c.c:5
5   printf("faith\n");
(rr) step
faith
6   int a = 0;
(rr) disas
Dump of assembler code for function run:
   0x000055bcd219f169 <+0>: endbr64
   0x000055bcd219f16d <+4>: push   rbp
   0x000055bcd219f16e <+5>: mov    rbp,rsp
   0x000055bcd219f171 <+8>: sub    rsp,0x10
   0x000055bcd219f175 <+12>:  lea    rax,[rip+0xe88]        # 0x55bcd21a0004
   0x000055bcd219f17c <+19>:  mov    rdi,rax
   0x000055bcd219f17f <+22>:  call   0x55bcd219f060 <puts@plt>
=> 0x000055bcd219f184 <+27>:  mov    DWORD PTR [rbp-0x4],0x0
   0x000055bcd219f18b <+34>:  mov    eax,DWORD PTR [rbp-0x4]
   0x000055bcd219f18e <+37>:  mov    esi,eax
   0x000055bcd219f190 <+39>:  lea    rax,[rip+0xe73]        # 0x55bcd21a000a
   0x000055bcd219f197 <+46>:  mov    rdi,rax
   0x000055bcd219f19a <+49>:  mov    eax,0x0
   0x000055bcd219f19f <+54>:  call   0x55bcd219f070 <printf@plt>
   0x000055bcd219f1a4 <+59>:  nop
   0x000055bcd219f1a5 <+60>:  leave
   0x000055bcd219f1a6 <+61>:  ret
End of assembler dump.

edit note: installing apts gdb (GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1) leads to the step stepping into the puts, which is unhelpful; with next instead of step it runs correctly; might depend on the c stdlib in use and/or whether it has debuginfo available?

that
$ rr replay -d /usr/bin/gdb
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/dzaima/.local/share/rr/loop_c-6/mmap_hardlink_4_loop_c...
Really redefine built-in command "restart"? (y or n) [answered Y; input not from terminal]
Really redefine built-in command "jump"? (y or n) [answered Y; input not from terminal]
Remote debugging using 127.0.0.1:6982
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/97/18d3757f00d2366056830aae09698dbd35e32c.debug...
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
0x00007fd96329a290 in _start () from /lib64/ld-linux-x86-64.so.2
(rr) b run
Breakpoint 1 at 0x56007ca23175: file loop_c.c, line 5.
(rr) c
Continuing.

Breakpoint 1, run () at loop_c.c:5
5	  printf("faith\n");
(rr) step
__GI__IO_puts (str=0x56007ca24004 "faith") at ./libio/ioputs.c:33
33	./libio/ioputs.c: No such file or directory.
(rr)
quit
Detaching from program: /home/dzaima/.local/share/rr/loop_c-6/mmap_hardlink_4_loop_c, process 6959
[Inferior 1 (process 6959) detached]


$ rr replay -d /usr/bin/gdb
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/dzaima/.local/share/rr/loop_c-6/mmap_hardlink_4_loop_c...
Really redefine built-in command "restart"? (y or n) [answered Y; input not from terminal]
Really redefine built-in command "jump"? (y or n) [answered Y; input not from terminal]
Remote debugging using 127.0.0.1:7319
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/97/18d3757f00d2366056830aae09698dbd35e32c.debug...
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
0x00007fd96329a290 in _start () from /lib64/ld-linux-x86-64.so.2
(rr) b run
Breakpoint 1 at 0x56007ca23175: file loop_c.c, line 5.
(rr) c
Continuing.

Breakpoint 1, run () at loop_c.c:5
5	  printf("faith\n");
(rr) next
faith
6	  int a = 0;

dzaima avatar Dec 07 '23 15:12 dzaima

@dzaima my system gdb is 9.2 , my nix one is 12.1

I assume that it does seem related to something in my nix env

alehander92 avatar Dec 07 '23 16:12 alehander92