reverse-stepi stepping more than one instruction
If I run reverse-stepi after running to a certain event with rr replay -g, then I've got a situation where reverse-stepi steps more than one instruction:
(rr) disas
Dump of assembler code for function core::slice::from_raw_parts<u8>:
0x000056197ad5aab0 <+0>: push %rbp
0x000056197ad5aab1 <+1>: mov %rsp,%rbp
0x000056197ad5aab4 <+4>: mov %rdi,-0x40(%rbp)
0x000056197ad5aab8 <+8>: mov %rsi,-0x38(%rbp)
0x000056197ad5aabc <+12>: mov -0x40(%rbp),%rax
0x000056197ad5aac0 <+16>: mov %rax,-0x30(%rbp)
0x000056197ad5aac4 <+20>: mov -0x38(%rbp),%rax
0x000056197ad5aac8 <+24>: mov %rax,-0x28(%rbp)
0x000056197ad5aacc <+28>: mov -0x30(%rbp),%rax
0x000056197ad5aad0 <+32>: mov -0x28(%rbp),%rcx
0x000056197ad5aad4 <+36>: mov %rax,-0x20(%rbp)
0x000056197ad5aad8 <+40>: mov %rcx,-0x18(%rbp)
0x000056197ad5aadc <+44>: mov -0x20(%rbp),%rax
=> 0x000056197ad5aae0 <+48>: mov -0x18(%rbp),%rcx
0x000056197ad5aae4 <+52>: mov %rax,-0x10(%rbp)
0x000056197ad5aae8 <+56>: mov %rcx,-0x8(%rbp)
0x000056197ad5aaec <+60>: mov -0x10(%rbp),%rax
0x000056197ad5aaf0 <+64>: mov -0x8(%rbp),%rdx
0x000056197ad5aaf4 <+68>: mov %rax,-0x48(%rbp)
0x000056197ad5aaf8 <+72>: mov %rdx,-0x50(%rbp)
0x000056197ad5aafc <+76>: mov -0x48(%rbp),%rax
0x000056197ad5ab00 <+80>: mov -0x50(%rbp),%rdx
0x000056197ad5ab04 <+84>: pop %rbp
0x000056197ad5ab05 <+85>: retq
End of assembler dump.
(rr) reverse-stepi
Thread 1 received signal SIGINT, Interrupt.
core::slice::{{impl}}::index<u8> (self=..., slice=...) at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs:708
708 in /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs
(rr) disas
Dump of assembler code for function core::slice::{{impl}}::index<u8>:
0x000056197ad5b040 <+0>: push %rbp
0x000056197ad5b041 <+1>: mov %rsp,%rbp
0x000056197ad5b044 <+4>: sub $0x90,%rsp
0x000056197ad5b04b <+11>: mov (%rdi),%rax
0x000056197ad5b04e <+14>: mov %rax,-0x58(%rbp)
0x000056197ad5b052 <+18>: mov 0x8(%rdi),%rax
0x000056197ad5b056 <+22>: mov %rax,-0x50(%rbp)
0x000056197ad5b05a <+26>: mov %rsi,-0x48(%rbp)
0x000056197ad5b05e <+30>: mov %rdx,-0x40(%rbp)
0x000056197ad5b062 <+34>: mov -0x58(%rbp),%rax
0x000056197ad5b066 <+38>: mov -0x50(%rbp),%rcx
0x000056197ad5b06a <+42>: mov %rax,-0x38(%rbp)
0x000056197ad5b06e <+46>: mov %rcx,-0x30(%rbp)
0x000056197ad5b072 <+50>: mov -0x48(%rbp),%rax
0x000056197ad5b076 <+54>: mov -0x40(%rbp),%rcx
0x000056197ad5b07a <+58>: mov %rax,-0x28(%rbp)
0x000056197ad5b07e <+62>: mov %rcx,-0x20(%rbp)
0x000056197ad5b082 <+66>: mov -0x38(%rbp),%rax
0x000056197ad5b086 <+70>: cmp -0x30(%rbp),%rax
0x000056197ad5b08a <+74>: jbe 0x56197ad5b099 <core::slice::{{impl}}::index<u8>+89>
0x000056197ad5b08c <+76>: mov -0x38(%rbp),%rdi
0x000056197ad5b090 <+80>: mov -0x30(%rbp),%rsi
0x000056197ad5b094 <+84>: callq 0x56197af5c380 <core::slice::slice_index_order_fail>
=> 0x000056197ad5b099 <+89>: mov -0x30(%rbp),%rax
0x000056197ad5b09d <+93>: mov -0x28(%rbp),%rdi
0x000056197ad5b0a1 <+97>: mov -0x20(%rbp),%rsi
0x000056197ad5b0a5 <+101>: mov %rax,-0x60(%rbp)
---Type <return> to continue, or q <return> to quit---q
Quit
(rr) stepi
Thread 1 received signal SIGINT, Interrupt.
0x000056197ad5aae0 in core::slice::from_raw_parts<u8> (p=0x56197af69140 <str.V> "raw_capacity overflowraw_cap overflow\000", len=0) at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs:2161
2161 in /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs
(rr)
Thread 1 received signal SIGKILL, Killed.
0x000056197ad5aae0 in core::slice::from_raw_parts<u8> (p=0x56197af69140 <str.V> "raw_capacity overflowraw_cap overflow\000", len=0) at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs:2161
2161 in /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/slice.rs
(rr)
[Inferior 1 (process 26877) exited normally]
(rr)
If you need more information, please tell me what is necessary.
Can you dump the portion of rr dump -b -m -p <tracedir> around the event number you're passing to -g? Also, in your rr session before and after doing reverse-stepi, run when and give the result?
Before reverse-stepi:
(rr) when
Current event: 4368545
After reverse-stepi (this is the event number I passed to -g!)
(rr) when
Current event: 4368543
The rr dump output (from 4368540 to 4368550):
{
real_time:42380.114630 global_time:4368540, event:`SYSCALL: write' (state:ENTERING_SYSCALL) tid:26877, ticks:8063044799
rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x11 rsi:0x56197af64b70 rdi:0x2 rbp:0x2 rsp:0x681ffe10 r8:0x5 r9:0x5 r10:0x7ffd21231be8 r11:0x246 r12:0x0 r13:0x56197af64b70 r14:0x11 r15:0x1 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x1 fs_base:0x7f73ce61b840 gs_base:0x0
}
{
real_time:42380.114644 global_time:4368541, event:`SYSCALL: write' (state:EXITING_SYSCALL) tid:26877, ticks:8063044799
rax:0x11 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x11 rsi:0x56197af64b70 rdi:0x2 rbp:0x2 rsp:0x681ffe10 r8:0x5 r9:0x5 r10:0x7ffd21231be8 r11:0x246 r12:0x0 r13:0x56197af64b70 r14:0x11 r15:0x1 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x1 fs_base:0x7f73ce61b840 gs_base:0x0
}
{
real_time:42432.296747 global_time:4368542, event:`SYSCALLBUF_FLUSH' tid:26877, ticks:16848366754
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x3c68, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x149e, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'write', ret:0x0, size:0x10 }
{ syscall:'clock_gettime', ret:0x0, size:0x20 }
}
{
real_time:42432.296750 global_time:4368543, event:`SIGNAL: SIGINT(async)' tid:26877, ticks:16848366754
rax:0x56197af69140 rbx:0x0 rcx:0x0 rdx:0x0 rsi:0x0 rdi:0x56197af69140 rbp:0x7ffd212269f0 rsp:0x7ffd212269f0 r8:0x0 r9:0x0 r10:0x7ffd2122eb04 r11:0x7ffd21226578 r12:0x2 r13:0x7ffd212406fb r14:0x7ffd2123e730 r15:0x7ffd2123e778 rip:0x56197ad5aae0 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffff02 fs_base:0x7f73ce61b840 gs_base:0x0
}
{
real_time:42432.296753 global_time:4368544, event:`SYSCALLBUF_RESET' tid:26877, ticks:16848366754
}
{
real_time:42432.296793 global_time:4368545, event:`SIGNAL_DELIVERY: SIGINT(async)' tid:26877, ticks:16848366754
rax:0x56197af69140 rbx:0x0 rcx:0x0 rdx:0x0 rsi:0x0 rdi:0x56197af69140 rbp:0x7ffd212269f0 rsp:0x7ffd212269f0 r8:0x0 r9:0x0 r10:0x7ffd2122eb04 r11:0x7ffd21226578 r12:0x2 r13:0x7ffd212406fb r14:0x7ffd2123e730 r15:0x7ffd2123e778 rip:0x56197ad5aae0 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffff02 fs_base:0x7f73ce61b840 gs_base:0x0
{ tid:26877, addr:0x7ffd212269f0, length:(nil) }
}
{
real_time:42432.296874 global_time:4368546, event:`UNSTABLE_EXIT' tid:26903, ticks:40446004
}
{
real_time:42432.296913 global_time:4368547, event:`UNSTABLE_EXIT' tid:26902, ticks:274877915
}
{
real_time:42432.296928 global_time:4368548, event:`UNSTABLE_EXIT' tid:26901, ticks:277435926
}
{
real_time:42432.296944 global_time:4368549, event:`UNSTABLE_EXIT' tid:26900, ticks:272275054
}
{
real_time:42432.296959 global_time:4368550, event:`UNSTABLE_EXIT' tid:26899, ticks:274397870
}
I terminated the program with Ctrl+C because it hung when recording. Perhaps the issue is that I accidently used an event number that lies after this event?
We should be able to step through a SIGINT, or any other signal, successfully. We have tests for this. For some reason it's behaving strangely in your case.
I can't reproduce this writing simple testcases.
You could try rerunning with RR_LOG=GdbConnection,GdbServer,ReplayTimeline and what we get.
@rocallahan re-run as in rr replay, or create a new recording?
rr replay
I've pasted the rr replay session with those environment settings: https://gist.github.com/bennofs/4c61ea7c11372a3d855e2f1d1e7e3cf3
Thanks!
I've just pushed https://github.com/mozilla/rr/commit/70fa1d86789dc254b5755d743cc0c129950850b0 to add some more useful logging. Could you retry your test with RR_LOG=GdbConnection,ReplayTimeline,GdbServer RR_LOG_BUFFER=100000?
Unfortunately, it looks like I deleted the original replay :flushed:
I have a few different replays from the same program, they all seem to end with:
Program received signal SIGKILL, Killed.
[FATAL /home/.cache/pacaur/rr-git/src/rr/src/Task.cc:2114:read_bytes_helper() errno: EIO]
(task 13342 (rec:24530) at time 0)
-> Assertion `false' failed to hold. Should have read 15 bytes from 0x681ffe01, but only read 0
Launch gdb with
gdb '-l' '10000' '-ex' 'target extended-remote :13342' /home/.local/share/rr/nix-index-4//mmap_clone_3_nix-index
Is that normal behaviour? Or may it be caused be the same bug?
The SIGKILL is normal. The assertion failure is not. Can you attach gdb to the rr process and get a stacktrace of rr there?
@rocallahan the stacktrace at after that message was printed is:
gdb-peda$ bt
#0 0x00007f7ffc72fad0 in __accept_nocancel () from /usr/lib/libpthread.so.0
#1 0x0000000000461477 in rr::GdbConnection::await_debugger(rr::ScopedFd&) ()
#2 0x0000000000474bfe in ?? ()
#3 0x000000000047eea5 in rr::GdbServer::emergency_debug(rr::Task*) ()
#4 0x000000000048a7e1 in rr::EmergencyDebugOstream::~EmergencyDebugOstream() ()
#5 0x000000000051333b in rr::Task::read_bytes_helper(rr::remote_ptr<void>, long, void*, bool*) ()
#6 0x0000000000446093 in rr::AutoRestoreMem::init(void const*, long) ()
#7 0x0000000000511986 in rr::Task::open_mem_fd() ()
#8 0x0000000000519958 in rr::Task::clone(rr::Task::CloneReason, int, rr::remote_ptr<void>, rr::remote_ptr<void>, rr::remote_ptr<int>, int, int, unsigned int, rr::Session*) ()
#9 0x000000000051903a in rr::Task::os_clone(rr::Task::CloneReason, rr::Task*, rr::Session*, rr::AutoRemoteSyscalls&, int, unsigned int, unsigned int, rr::remote_ptr<void>, rr::remote_ptr<int>, rr::remote_ptr<void>, rr::remote_ptr<int>) ()
#10 0x0000000000519127 in rr::Task::os_fork_into(rr::Session*) ()
#11 0x000000000050bcf7 in rr::Session::copy_state_to(rr::Session&, rr::EmuFs&, rr::EmuFs&) ()
#12 0x00000000004e54a5 in rr::ReplaySession::clone_diversion() ()
#13 0x000000000047b20c in rr::GdbServer::divert(rr::ReplaySession&) ()
#14 0x000000000047bc61 in rr::GdbServer::process_debugger_requests(rr::GdbServer::ReportState) ()
#15 0x000000000047d51a in rr::GdbServer::debug_one_step(rr::GdbRequest&) ()
#16 0x000000000047e9f0 in rr::GdbServer::serve_replay(rr::GdbServer::ConnectionFlags const&) ()
#17 0x00000000004dbb26 in ?? ()
#18 0x00000000004dcd63 in rr::ReplayCommand::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) ()
#19 0x000000000043383f in main ()
#20 0x00007f7ffc39a43a in __libc_start_main () from /usr/lib/libc.so.6
#21 0x000000000043390a in _start ()
Or is there a way to get the stacktrace of the assert itself?
Btw, are these traces portable? Perhaps I can just upload them somewhere so you can reproduce the issue?
Alright, I think I've managed to reproduce the original problem:
(rr) x/10i $pc
=> 0x55dd0eb56f30 <core::ptr::{{impl}}::offset<u8>+48>: mov -0x30(%rbp),%rax
(rr) reverse-stepi
Program received signal SIGINT, Interrupt.
core::ptr::{{impl}}::offset<u8> (self=0x7f0b28636c40 "\azA\016\375\326y\200\346s\000", count=0) at /checkout/src/libcore/ptr.rs:619
619 in /checkout/src/libcore/ptr.rs
(rr)
0x000055dd0eb56f2c 618 in /checkout/src/libcore/ptr.rs
(rr)
0x000055dd0eb56f28 618 in /checkout/src/libcore/ptr.rs
(rr) x/3i $pc
=> 0x55dd0eb56f28 <core::ptr::{{impl}}::offset<u8>+40>: mov -0x8(%rbp),%rax
0x55dd0eb56f2c <core::ptr::{{impl}}::offset<u8>+44>: mov %rax,-0x30(%rbp)
0x55dd0eb56f30 <core::ptr::{{impl}}::offset<u8>+48>: mov -0x30(%rbp),%rax
It's not as extreme as in the original example, but it still stepped 2 instead of 1 instructions backwards.
I will try to run it with the new logging.
EDIT: Oops, false alarm. Looks like i pressed the return key two times while waiting for reverse-stepi to finish, so I actually ran reverse-stepi three times
Btw, are these traces portable? Perhaps I can just upload them somewhere so you can reproduce the issue?
No, not yet.
Can you run with RR_LOG=GdbConnection? It looks like your assertion failure is caused by trying to execute some application function after the SIGKILL.
@bennofs It's been a while - and as I have a similar issue #3618, can you please recheck if that still happens for you on master? If not then we may close this issue.
I don't have the trace anymore, and don't remember the details for this issue. Feel free to close