rr icon indicating copy to clipboard operation
rr copied to clipboard

reverse-stepi stepping more than one instruction

Open bennofs opened this issue 8 years ago • 16 comments

If I run reverse-stepi after running to a certain event with rr replay -g, then I've got a situation where reverse-stepi steps more than one instruction:

(rr) disas
Dump of assembler code for function core::slice::from_raw_parts<u8>:
   0x000056197ad5aab0 <+0>:	push   %rbp
   0x000056197ad5aab1 <+1>:	mov    %rsp,%rbp
   0x000056197ad5aab4 <+4>:	mov    %rdi,-0x40(%rbp)
   0x000056197ad5aab8 <+8>:	mov    %rsi,-0x38(%rbp)
   0x000056197ad5aabc <+12>:	mov    -0x40(%rbp),%rax
   0x000056197ad5aac0 <+16>:	mov    %rax,-0x30(%rbp)
   0x000056197ad5aac4 <+20>:	mov    -0x38(%rbp),%rax
   0x000056197ad5aac8 <+24>:	mov    %rax,-0x28(%rbp)
   0x000056197ad5aacc <+28>:	mov    -0x30(%rbp),%rax
   0x000056197ad5aad0 <+32>:	mov    -0x28(%rbp),%rcx
   0x000056197ad5aad4 <+36>:	mov    %rax,-0x20(%rbp)
   0x000056197ad5aad8 <+40>:	mov    %rcx,-0x18(%rbp)
   0x000056197ad5aadc <+44>:	mov    -0x20(%rbp),%rax
=> 0x000056197ad5aae0 <+48>:	mov    -0x18(%rbp),%rcx
   0x000056197ad5aae4 <+52>:	mov    %rax,-0x10(%rbp)
   0x000056197ad5aae8 <+56>:	mov    %rcx,-0x8(%rbp)
   0x000056197ad5aaec <+60>:	mov    -0x10(%rbp),%rax
   0x000056197ad5aaf0 <+64>:	mov    -0x8(%rbp),%rdx
   0x000056197ad5aaf4 <+68>:	mov    %rax,-0x48(%rbp)
   0x000056197ad5aaf8 <+72>:	mov    %rdx,-0x50(%rbp)
   0x000056197ad5aafc <+76>:	mov    -0x48(%rbp),%rax
   0x000056197ad5ab00 <+80>:	mov    -0x50(%rbp),%rdx
   0x000056197ad5ab04 <+84>:	pop    %rbp
   0x000056197ad5ab05 <+85>:	retq   
End of assembler dump.
(rr) reverse-stepi

Thread 1 received signal SIGINT, Interrupt.
core::slice::{{impl}}::index<u8> (self=..., slice=...) at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs:708
708	in /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs
(rr) disas
Dump of assembler code for function core::slice::{{impl}}::index<u8>:
   0x000056197ad5b040 <+0>:	push   %rbp
   0x000056197ad5b041 <+1>:	mov    %rsp,%rbp
   0x000056197ad5b044 <+4>:	sub    $0x90,%rsp
   0x000056197ad5b04b <+11>:	mov    (%rdi),%rax
   0x000056197ad5b04e <+14>:	mov    %rax,-0x58(%rbp)
   0x000056197ad5b052 <+18>:	mov    0x8(%rdi),%rax
   0x000056197ad5b056 <+22>:	mov    %rax,-0x50(%rbp)
   0x000056197ad5b05a <+26>:	mov    %rsi,-0x48(%rbp)
   0x000056197ad5b05e <+30>:	mov    %rdx,-0x40(%rbp)
   0x000056197ad5b062 <+34>:	mov    -0x58(%rbp),%rax
   0x000056197ad5b066 <+38>:	mov    -0x50(%rbp),%rcx
   0x000056197ad5b06a <+42>:	mov    %rax,-0x38(%rbp)
   0x000056197ad5b06e <+46>:	mov    %rcx,-0x30(%rbp)
   0x000056197ad5b072 <+50>:	mov    -0x48(%rbp),%rax
   0x000056197ad5b076 <+54>:	mov    -0x40(%rbp),%rcx
   0x000056197ad5b07a <+58>:	mov    %rax,-0x28(%rbp)
   0x000056197ad5b07e <+62>:	mov    %rcx,-0x20(%rbp)
   0x000056197ad5b082 <+66>:	mov    -0x38(%rbp),%rax
   0x000056197ad5b086 <+70>:	cmp    -0x30(%rbp),%rax
   0x000056197ad5b08a <+74>:	jbe    0x56197ad5b099 <core::slice::{{impl}}::index<u8>+89>
   0x000056197ad5b08c <+76>:	mov    -0x38(%rbp),%rdi
   0x000056197ad5b090 <+80>:	mov    -0x30(%rbp),%rsi
   0x000056197ad5b094 <+84>:	callq  0x56197af5c380 <core::slice::slice_index_order_fail>
=> 0x000056197ad5b099 <+89>:	mov    -0x30(%rbp),%rax
   0x000056197ad5b09d <+93>:	mov    -0x28(%rbp),%rdi
   0x000056197ad5b0a1 <+97>:	mov    -0x20(%rbp),%rsi
   0x000056197ad5b0a5 <+101>:	mov    %rax,-0x60(%rbp)
---Type <return> to continue, or q <return> to quit---q
Quit
(rr) stepi

Thread 1 received signal SIGINT, Interrupt.
0x000056197ad5aae0 in core::slice::from_raw_parts<u8> (p=0x56197af69140 <str.V> "raw_capacity overflowraw_cap overflow\000", len=0) at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs:2161
2161	in /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs
(rr) 

Thread 1 received signal SIGKILL, Killed.
0x000056197ad5aae0 in core::slice::from_raw_parts<u8> (p=0x56197af69140 <str.V> "raw_capacity overflowraw_cap overflow\000", len=0) at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs:2161
2161	in /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs
(rr) 
[Inferior 1 (process 26877) exited normally]
(rr) 

If you need more information, please tell me what is necessary.

bennofs avatar Apr 06 '17 22:04 bennofs

Can you dump the portion of rr dump -b -m -p <tracedir> around the event number you're passing to -g? Also, in your rr session before and after doing reverse-stepi, run when and give the result?

rocallahan avatar Apr 06 '17 22:04 rocallahan

Before reverse-stepi:

(rr) when
Current event: 4368545

After reverse-stepi (this is the event number I passed to -g!)

(rr) when
Current event: 4368543

The rr dump output (from 4368540 to 4368550):

{
  real_time:42380.114630 global_time:4368540, event:`SYSCALL: write' (state:ENTERING_SYSCALL) tid:26877, ticks:8063044799
 rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x11 rsi:0x56197af64b70 rdi:0x2 rbp:0x2 rsp:0x681ffe10 r8:0x5 r9:0x5 r10:0x7ffd21231be8 r11:0x246 r12:0x0 r13:0x56197af64b70 r14:0x11 r15:0x1 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x1 fs_base:0x7f73ce61b840 gs_base:0x0
}
{
  real_time:42380.114644 global_time:4368541, event:`SYSCALL: write' (state:EXITING_SYSCALL) tid:26877, ticks:8063044799
 rax:0x11 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x11 rsi:0x56197af64b70 rdi:0x2 rbp:0x2 rsp:0x681ffe10 r8:0x5 r9:0x5 r10:0x7ffd21231be8 r11:0x246 r12:0x0 r13:0x56197af64b70 r14:0x11 r15:0x1 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x1 fs_base:0x7f73ce61b840 gs_base:0x0
}
{
  real_time:42432.296747 global_time:4368542, event:`SYSCALLBUF_FLUSH' tid:26877, ticks:16848366754
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x3c68, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x149e, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'write', ret:0x0, size:0x10 }
  { syscall:'clock_gettime', ret:0x0, size:0x20 }
}
{
  real_time:42432.296750 global_time:4368543, event:`SIGNAL: SIGINT(async)' tid:26877, ticks:16848366754
 rax:0x56197af69140 rbx:0x0 rcx:0x0 rdx:0x0 rsi:0x0 rdi:0x56197af69140 rbp:0x7ffd212269f0 rsp:0x7ffd212269f0 r8:0x0 r9:0x0 r10:0x7ffd2122eb04 r11:0x7ffd21226578 r12:0x2 r13:0x7ffd212406fb r14:0x7ffd2123e730 r15:0x7ffd2123e778 rip:0x56197ad5aae0 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffff02 fs_base:0x7f73ce61b840 gs_base:0x0
}
{
  real_time:42432.296753 global_time:4368544, event:`SYSCALLBUF_RESET' tid:26877, ticks:16848366754
}
{
  real_time:42432.296793 global_time:4368545, event:`SIGNAL_DELIVERY: SIGINT(async)' tid:26877, ticks:16848366754
 rax:0x56197af69140 rbx:0x0 rcx:0x0 rdx:0x0 rsi:0x0 rdi:0x56197af69140 rbp:0x7ffd212269f0 rsp:0x7ffd212269f0 r8:0x0 r9:0x0 r10:0x7ffd2122eb04 r11:0x7ffd21226578 r12:0x2 r13:0x7ffd212406fb r14:0x7ffd2123e730 r15:0x7ffd2123e778 rip:0x56197ad5aae0 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffff02 fs_base:0x7f73ce61b840 gs_base:0x0
  { tid:26877, addr:0x7ffd212269f0, length:(nil) }
}
{
  real_time:42432.296874 global_time:4368546, event:`UNSTABLE_EXIT' tid:26903, ticks:40446004
}
{
  real_time:42432.296913 global_time:4368547, event:`UNSTABLE_EXIT' tid:26902, ticks:274877915
}
{
  real_time:42432.296928 global_time:4368548, event:`UNSTABLE_EXIT' tid:26901, ticks:277435926
}
{
  real_time:42432.296944 global_time:4368549, event:`UNSTABLE_EXIT' tid:26900, ticks:272275054
}
{
  real_time:42432.296959 global_time:4368550, event:`UNSTABLE_EXIT' tid:26899, ticks:274397870
}

bennofs avatar Apr 06 '17 22:04 bennofs

I terminated the program with Ctrl+C because it hung when recording. Perhaps the issue is that I accidently used an event number that lies after this event?

bennofs avatar Apr 06 '17 22:04 bennofs

We should be able to step through a SIGINT, or any other signal, successfully. We have tests for this. For some reason it's behaving strangely in your case.

rocallahan avatar Apr 07 '17 02:04 rocallahan

I can't reproduce this writing simple testcases.

You could try rerunning with RR_LOG=GdbConnection,GdbServer,ReplayTimeline and what we get.

rocallahan avatar Apr 07 '17 02:04 rocallahan

@rocallahan re-run as in rr replay, or create a new recording?

bennofs avatar Apr 23 '17 11:04 bennofs

rr replay

rocallahan avatar Apr 23 '17 20:04 rocallahan

I've pasted the rr replay session with those environment settings: https://gist.github.com/bennofs/4c61ea7c11372a3d855e2f1d1e7e3cf3

bennofs avatar May 10 '17 17:05 bennofs

Thanks!

I've just pushed https://github.com/mozilla/rr/commit/70fa1d86789dc254b5755d743cc0c129950850b0 to add some more useful logging. Could you retry your test with RR_LOG=GdbConnection,ReplayTimeline,GdbServer RR_LOG_BUFFER=100000?

rocallahan avatar May 11 '17 00:05 rocallahan

Unfortunately, it looks like I deleted the original replay :flushed:

I have a few different replays from the same program, they all seem to end with:

Program received signal SIGKILL, Killed.
[FATAL /home/.cache/pacaur/rr-git/src/rr/src/Task.cc:2114:read_bytes_helper() errno: EIO] 
 (task 13342 (rec:24530) at time 0)
 -> Assertion `false' failed to hold. Should have read 15 bytes from 0x681ffe01, but only read 0
Launch gdb with
  gdb '-l' '10000' '-ex' 'target extended-remote :13342' /home/.local/share/rr/nix-index-4//mmap_clone_3_nix-index

Is that normal behaviour? Or may it be caused be the same bug?

bennofs avatar Jun 15 '17 09:06 bennofs

The SIGKILL is normal. The assertion failure is not. Can you attach gdb to the rr process and get a stacktrace of rr there?

rocallahan avatar Jun 15 '17 09:06 rocallahan

@rocallahan the stacktrace at after that message was printed is:

gdb-peda$ bt
#0  0x00007f7ffc72fad0 in __accept_nocancel () from /usr/lib/libpthread.so.0
#1  0x0000000000461477 in rr::GdbConnection::await_debugger(rr::ScopedFd&) ()
#2  0x0000000000474bfe in ?? ()
#3  0x000000000047eea5 in rr::GdbServer::emergency_debug(rr::Task*) ()
#4  0x000000000048a7e1 in rr::EmergencyDebugOstream::~EmergencyDebugOstream() ()
#5  0x000000000051333b in rr::Task::read_bytes_helper(rr::remote_ptr<void>, long, void*, bool*) ()
#6  0x0000000000446093 in rr::AutoRestoreMem::init(void const*, long) ()
#7  0x0000000000511986 in rr::Task::open_mem_fd() ()
#8  0x0000000000519958 in rr::Task::clone(rr::Task::CloneReason, int, rr::remote_ptr<void>, rr::remote_ptr<void>, rr::remote_ptr<int>, int, int, unsigned int, rr::Session*) ()
#9  0x000000000051903a in rr::Task::os_clone(rr::Task::CloneReason, rr::Task*, rr::Session*, rr::AutoRemoteSyscalls&, int, unsigned int, unsigned int, rr::remote_ptr<void>, rr::remote_ptr<int>, rr::remote_ptr<void>, rr::remote_ptr<int>) ()
#10 0x0000000000519127 in rr::Task::os_fork_into(rr::Session*) ()
#11 0x000000000050bcf7 in rr::Session::copy_state_to(rr::Session&, rr::EmuFs&, rr::EmuFs&) ()
#12 0x00000000004e54a5 in rr::ReplaySession::clone_diversion() ()
#13 0x000000000047b20c in rr::GdbServer::divert(rr::ReplaySession&) ()
#14 0x000000000047bc61 in rr::GdbServer::process_debugger_requests(rr::GdbServer::ReportState) ()
#15 0x000000000047d51a in rr::GdbServer::debug_one_step(rr::GdbRequest&) ()
#16 0x000000000047e9f0 in rr::GdbServer::serve_replay(rr::GdbServer::ConnectionFlags const&) ()
#17 0x00000000004dbb26 in ?? ()
#18 0x00000000004dcd63 in rr::ReplayCommand::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) ()
#19 0x000000000043383f in main ()
#20 0x00007f7ffc39a43a in __libc_start_main () from /usr/lib/libc.so.6
#21 0x000000000043390a in _start ()

Or is there a way to get the stacktrace of the assert itself?

Btw, are these traces portable? Perhaps I can just upload them somewhere so you can reproduce the issue?

bennofs avatar Jun 15 '17 15:06 bennofs

Alright, I think I've managed to reproduce the original problem:

(rr) x/10i $pc
=> 0x55dd0eb56f30 <core::ptr::{{impl}}::offset<u8>+48>:	mov    -0x30(%rbp),%rax
(rr) reverse-stepi
Program received signal SIGINT, Interrupt.
core::ptr::{{impl}}::offset<u8> (self=0x7f0b28636c40 "\azA\016\375\326y\200\346s\000", count=0) at /checkout/src/libcore/ptr.rs:619
619	in /checkout/src/libcore/ptr.rs
(rr) 
0x000055dd0eb56f2c	618	in /checkout/src/libcore/ptr.rs
(rr) 
0x000055dd0eb56f28	618	in /checkout/src/libcore/ptr.rs
(rr) x/3i $pc
=> 0x55dd0eb56f28 <core::ptr::{{impl}}::offset<u8>+40>:	mov    -0x8(%rbp),%rax
   0x55dd0eb56f2c <core::ptr::{{impl}}::offset<u8>+44>:	mov    %rax,-0x30(%rbp)
   0x55dd0eb56f30 <core::ptr::{{impl}}::offset<u8>+48>:	mov    -0x30(%rbp),%rax

It's not as extreme as in the original example, but it still stepped 2 instead of 1 instructions backwards.

I will try to run it with the new logging.

EDIT: Oops, false alarm. Looks like i pressed the return key two times while waiting for reverse-stepi to finish, so I actually ran reverse-stepi three times

bennofs avatar Jun 15 '17 16:06 bennofs

Btw, are these traces portable? Perhaps I can just upload them somewhere so you can reproduce the issue?

No, not yet.

Can you run with RR_LOG=GdbConnection? It looks like your assertion failure is caused by trying to execute some application function after the SIGKILL.

rocallahan avatar Jun 16 '17 02:06 rocallahan

@bennofs It's been a while - and as I have a similar issue #3618, can you please recheck if that still happens for you on master? If not then we may close this issue.

GitMensch avatar Sep 27 '23 18:09 GitMensch

I don't have the trace anymore, and don't remember the details for this issue. Feel free to close

bennofs avatar Sep 30 '23 10:09 bennofs