rr icon indicating copy to clipboard operation
rr copied to clipboard

Replay divergence

Open Keno opened this issue 5 years ago • 31 comments

The processes in questions do a bunch of complicated stuff, including user namespaces. However, an earlier version of this software recorded fine under rr (never tried this exact version though, so can't say if this is a regression). The value 0xffff000012345678 is something we pwrite to /proc/self/mem to test whether that mechanism is suitable for JITing, so that seems potentially related (though something more must be going on here, since we usually catch these writes).

[ERROR /home/keno/rr/src/Registers.cc:256:maybe_print_reg_mismatch()] rdx 0 != 0xffff000012345678 (replaying vs. recorded)
[FATAL /home/keno/rr/src/Registers.cc:366:compare_register_files()]
 (task 241681 (rec:241560) at time 10668)
 -> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:10900973/10900973)
Tail of trace dump:
{
  real_time:13731775.587633 global_time:10648, event:`SYSCALL: futex' (state:ENTERING_SYSCALL) tid:241560, ticks:10641061
rax:0xffffffffffffffda rbx:0x7f52a83dcd38 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dcd38 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa00 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa10 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.587661 global_time:10649, event:`SYSCALL: futex' (state:EXITING_SYSCALL) tid:241560, ticks:10641061
rax:0x0 rbx:0x7f52a83dcd38 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dcd38 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa00 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa10 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.587789 global_time:10650, event:`SYSCALL: futex' (state:ENTERING_SYSCALL) tid:241560, ticks:10641630
rax:0xffffffffffffffda rbx:0x7f52a83dc278 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dc278 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa00 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa10 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.587818 global_time:10651, event:`SYSCALL: futex' (state:EXITING_SYSCALL) tid:241560, ticks:10641630
rax:0x0 rbx:0x7f52a83dc278 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dc278 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa00 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa10 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.587954 global_time:10652, event:`SYSCALL: futex' (state:ENTERING_SYSCALL) tid:241560, ticks:10642189
rax:0xffffffffffffffda rbx:0x7f52a83dbdc0 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dbdc0 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa00 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa10 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.587983 global_time:10653, event:`SYSCALL: futex' (state:EXITING_SYSCALL) tid:241560, ticks:10642189
rax:0x0 rbx:0x7f52a83dbdc0 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dbdc0 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa00 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa10 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.588148 global_time:10654, event:`SYSCALL: futex' (state:ENTERING_SYSCALL) tid:241560, ticks:10642811
rax:0xffffffffffffffda rbx:0x7f52a83dbe98 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dbe98 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa00 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa10 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.588177 global_time:10655, event:`SYSCALL: futex' (state:EXITING_SYSCALL) tid:241560, ticks:10642811
rax:0x0 rbx:0x7f52a83dbe98 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dbe98 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa00 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa10 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.588307 global_time:10656, event:`SYSCALL: futex' (state:ENTERING_SYSCALL) tid:241560, ticks:10643480
rax:0xffffffffffffffda rbx:0x7f52a83dd258 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dd258 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa00 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa10 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.588336 global_time:10657, event:`SYSCALL: futex' (state:EXITING_SYSCALL) tid:241560, ticks:10643480
rax:0x0 rbx:0x7f52a83dd258 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dd258 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa00 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa10 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.588464 global_time:10658, event:`SYSCALL: futex' (state:ENTERING_SYSCALL) tid:241560, ticks:10644057
rax:0xffffffffffffffda rbx:0x7f52a83dc0d8 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dc0d8 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa10 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa20 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.588493 global_time:10659, event:`SYSCALL: futex' (state:EXITING_SYSCALL) tid:241560, ticks:10644057
rax:0x0 rbx:0x7f52a83dc0d8 rcx:0xffffffffffffffff rdx:0x7fffffff rsi:0x81 rdi:0x7f52a83dc0d8 rbp:0x7f52a5482dd0 rsp:0x7ffe3cf4aa10 r8:0x0 r9:0x19550a0 r10:0x0 r11:0x246 r12:0x7ffe3cf4aa20 r13:0x7ffe3cf4b550 r14:0x7ffe3cf4b520 r15:0x0 rip:0x7f52a87a0310 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.589378 global_time:10660, event:`SYSCALL: uname' (state:ENTERING_SYSCALL) tid:241560, ticks:10899954
rax:0xffffffffffffffda rbx:0x7ffe3cf4a7b0 rcx:0xffffffffffffffff rdx:0x1ac0d98 rsi:0x7f52a839d308 rdi:0x7ffe3cf4a7b0 rbp:0x7ffe3cf4a960 rsp:0x7ffe3cf4a798 r8:0x1 r9:0x3 r10:0x803 r11:0x246 r12:0x1a80e10 r13:0x1ac00c0 r14:0x1abf658 r15:0x1abe548 rip:0x7f52a849f4c7 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3f fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.589419 global_time:10661, event:`SYSCALL: uname' (state:EXITING_SYSCALL) tid:241560, ticks:10899954
rax:0x0 rbx:0x7ffe3cf4a7b0 rcx:0xffffffffffffffff rdx:0x1ac0d98 rsi:0x7f52a839d308 rdi:0x7ffe3cf4a7b0 rbp:0x7ffe3cf4a960 rsp:0x7ffe3cf4a798 r8:0x1 r9:0x3 r10:0x803 r11:0x246 r12:0x1a80e10 r13:0x1ac00c0 r14:0x1abf658 r15:0x1abe548 rip:0x7f52a849f4c7 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3f fs_base:0x7f52a974ec80 gs_base:0x0
  { tid:241560, addr:0x7ffe3cf4a7b0, length:0x186 }
}
{
  real_time:13731775.589565 global_time:10662, event:`SYSCALL: openat' (state:ENTERING_SYSCALL) tid:241560, ticks:10900566
rax:0xffffffffffffffda rbx:0x181002 rcx:0xffffffffffffffff rdx:0x181002 rsi:0x7f52a8f80908 rdi:0xffffffffffffff9c rbp:0x7f52a8f80908 rsp:0x7ffe3cf4a720 r8:0x0 r9:0x2e r10:0x0 r11:0x246 r12:0x1a80e10 r13:0x1ac00c0 r14:0x1abf658 r15:0x1abe548 rip:0x7f52a87a2641 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.589633 global_time:10663, event:`SYSCALL: openat' (state:EXITING_SYSCALL) tid:241560, ticks:10900566
rax:0xe rbx:0x181002 rcx:0xffffffffffffffff rdx:0x181002 rsi:0x7f52a8f80908 rdi:0xffffffffffffff9c rbp:0x7f52a8f80908 rsp:0x7ffe3cf4a720 r8:0x0 r9:0x2e r10:0x0 r11:0x246 r12:0x1a80e10 r13:0x1ac00c0 r14:0x1abf658 r15:0x1abe548 rip:0x7f52a87a2641 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.589761 global_time:10664, event:`SYSCALL: mmap' (state:ENTERING_SYSCALL) tid:241560, ticks:10900575
rax:0xffffffffffffffda rbx:0x5 rcx:0xffffffffffffffff rdx:0x5 rsi:0x1000 rdi:0x0 rbp:0xffffffffffffffff rsp:0x7ffe3cf4a768 r8:0xffffffffffffffff r9:0x0 r10:0x22 r11:0x246 r12:0x0 r13:0x0 r14:0x1000 r15:0x22 rip:0x7f52a84cb093 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x9 fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.589829 global_time:10665, event:`SYSCALL: mmap' (state:EXITING_SYSCALL) tid:241560, ticks:10900575
rax:0x7f52a9755000 rbx:0x5 rcx:0xffffffffffffffff rdx:0x5 rsi:0x1000 rdi:0x0 rbp:0xffffffffffffffff rsp:0x7ffe3cf4a768 r8:0xffffffffffffffff r9:0x0 r10:0x22 r11:0x246 r12:0x0 r13:0x0 r14:0x1000 r15:0x22 rip:0x7f52a84cb093 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x9 fs_base:0x7f52a974ec80 gs_base:0x0
  { map_file:"<ZERO>", addr:0x7f52a9755000, length:0x1000, prot_flags:"r-xp", file_offset:0x0, device:0, inode:0, data_file:"", data_offset:0x0, file_size:0x1000 }
}
{
  real_time:13731775.589960 global_time:10666, event:`SYSCALL: pwrite64' (state:ENTERING_SYSCALL) tid:241560, ticks:10900760
rax:0xffffffffffffffda rbx:0x7f52a9755000 rcx:0xffffffffffffffff rdx:0x8 rsi:0x7ffe3cf4a7a8 rdi:0xe rbp:0x7ffe3cf4a960 rsp:0x7ffe3cf4a790 r8:0xffffffffffffffff r9:0x0 r10:0x7f52a9755000 r11:0x246 r12:0xe r13:0x7f52a9261c70 r14:0x1abf658 r15:0x1abe548 rip:0x7f52a87a27a3 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x12 fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.589997 global_time:10667, event:`SYSCALL: pwrite64' (state:EXITING_SYSCALL) tid:241560, ticks:10900760
rax:0x8 rbx:0x7f52a9755000 rcx:0xffffffffffffffff rdx:0x8 rsi:0x7ffe3cf4a7a8 rdi:0xe rbp:0x7ffe3cf4a960 rsp:0x7ffe3cf4a790 r8:0xffffffffffffffff r9:0x0 r10:0x7f52a9755000 r11:0x246 r12:0xe r13:0x7f52a9261c70 r14:0x1abf658 r15:0x1abe548 rip:0x7f52a87a27a3 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x12 fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.590127 global_time:10668, event:`SYSCALL: munmap' (state:ENTERING_SYSCALL) tid:241560, ticks:10900973
rax:0xffffffffffffffda rbx:0x7f52a9755000 rcx:0xffffffffffffffff rdx:0xffff000012345678 rsi:0x1000 rdi:0x7f52a9755000 rbp:0x7ffe3cf4a960 rsp:0x7ffe3cf4a798 r8:0xffffffffffffffff r9:0x0 r10:0x574 r11:0x246 r12:0xe r13:0x7f52a9261c70 r14:0x1abf658 r15:0x1abe548 rip:0x7f52a84cb157 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xb fs_base:0x7f52a974ec80 gs_base:0x0
}
{
  real_time:13731775.590169 global_time:10669, event:`SYSCALL: munmap' (state:EXITING_SYSCALL) tid:241560, ticks:10900973
rax:0x0 rbx:0x7f52a9755000 rcx:0xffffffffffffffff rdx:0xffff000012345678 rsi:0x1000 rdi:0x7f52a9755000 rbp:0x7ffe3cf4a960 rsp:0x7ffe3cf4a798 r8:0xffffffffffffffff r9:0x0 r10:0x574 r11:0x246 r12:0xe r13:0x7f52a9261c70 r14:0x1abf658 r15:0x1abe548 rip:0x7f52a84cb157 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xb fs_base:0x7f52a974ec80 gs_base:0x0
}
=== Start rr backtrace:
/home/keno/rr-build/bin/rr(_ZN2rr13dump_rr_stackEv+0x44)[0x557bfe88c4f9]
/home/keno/rr-build/bin/rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0x1a2)[0x557bfe72809e]
/home/keno/rr-build/bin/rr(+0x2d3e4f)[0x557bfe74ce4f]
/home/keno/rr-build/bin/rr(_ZN2rr21EmergencyDebugOstreamD1Ev+0x62)[0x557bfe74d0aa]
/home/keno/rr-build/bin/rr(_ZN2rr9Registers22compare_register_filesEPNS_10ReplayTaskEPKcRKS0_S4_S6_NS_16MismatchBehaviorE+0x145)[0x557bfe7f48f7]
/home/keno/rr-build/bin/rr(_ZN2rr10ReplayTask13validate_regsEj+0x31a)[0x557bfe8145da]
/home/keno/rr-build/bin/rr(_ZN2rr13ReplaySession13enter_syscallEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0x764)[0x557bfe7fe210]
/home/keno/rr-build/bin/rr(_ZN2rr13ReplaySession18try_one_trace_stepEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0xf9)[0x557bfe801e25]
/home/keno/rr-build/bin/rr(_ZN2rr13ReplaySession11replay_stepERKNS0_15StepConstraintsE+0x15b)[0x557bfe80320b]
/home/keno/rr-build/bin/rr(_ZN2rr14ReplayTimeline19replay_step_forwardENS_10RunCommandEl+0x10e)[0x557bfe81e8da]
/home/keno/rr-build/bin/rr(_ZN2rr9GdbServer12serve_replayERKNS0_15ConnectionFlagsE+0x59)[0x557bfe726d5f]
/home/keno/rr-build/bin/rr(+0x38064d)[0x557bfe7f964d]
/home/keno/rr-build/bin/rr(_ZN2rr13ReplayCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x3a3)[0x557bfe7fa01b]
/home/keno/rr-build/bin/rr(main+0x1f7)[0x557bfe8a3de3]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fd7b0c801c1]
/home/keno/rr-build/bin/rr(_start+0x2a)[0x557bfe69fb1a]
=== End rr backtrace
Launch gdb with
  gdb '-l' '10000' '-ex' 'set sysroot /' '-ex' 'target extended-remote 127.0.0.1:45073' /home/keno/.local/share/rr/julia-109/mmap_copy_724_julia

Keno avatar Aug 01 '18 00:08 Keno

As, that's probably the comment here: https://github.com/mozilla/rr/blob/master/src/ProcMemMonitor.cc#L17-L18

Keno avatar Aug 01 '18 00:08 Keno

Not sure I'll have the time to get to this in the next few days (will try to get to it though), but here's how I figured this could be done:

  • For each RecordTask, record the pid namespace id in addition to that task's tid in the namespace
  • When opening a file, check if it ends in "/mem".
  • If so, t->open_fd the fd and use fstatfs to figure out if it's a procfs
  • If so, readlink /proc/<pidof(t)>/root and use that to strip the leading path components of the file name, then readlink, e.g. /proc/<pidof(t)>/root/<stripped filename path>/proc/1/ns/pid to figure out the pid namespace of the procfs that the child opened (pid 1 in the namespace is required to have the same pid namespace as the procfs itself).
  • Grab the tid in the corresponding namespace as usual from the file name
  • Search the record tasks by (<pid ns>,<tid in rec ns>) pairs
  • Record the found task's tuid in the trace (since we don't have enough information to reconstruct this during replay).

@rocallahan thoughts on the approach?

Keno avatar Aug 02 '18 19:08 Keno

Something like that should work I guess. I'd need to think pretty hard to decide if what you're suggesting is optimal.

Why do we need to store extra information in the trace? We record the data that was written via mem, with the rr-namespace tid of the target process.

rocallahan avatar Aug 02 '18 21:08 rocallahan

Why do we need to store extra information in the trace?

Right, the information to record is which thread corresponds to the n in /proc/n/mem, since it may have no relation to anything we recorded.

Keno avatar Aug 02 '18 21:08 Keno

But why do we need to know that during replay?

rocallahan avatar Aug 02 '18 21:08 rocallahan

But why do we need to know that during replay?

Oh, I see what you're saying. I suppose we don't.

Keno avatar Aug 02 '18 21:08 Keno

Actually I guess I'm not understanding something.

Right, the information to record is which thread corresponds to the n in /proc/n/mem, since it may have no relation to anything we recorded.

This is actually RawDataMetadata::rec_tid in the trace, isn't it?

rocallahan avatar Aug 02 '18 21:08 rocallahan

This is actually RawDataMetadata::rec_tid in the trace, isn't it?

So the folders in a /proc are relative to the the process that mounted the procfs, not the currently running process, so we need to figure out what the mounting process' pid namespace was (we can look at pid 1 in that namespace, which is guaranteed to actually be running in it - not a child pid namespace). Once we have that, we need to figure out if any of our tasks are known by that tgid in that pid namespace (for which I realize we may actually need to keep a stack of (<ns id>, <tgid>) pairs.

Keno avatar Aug 02 '18 21:08 Keno

Right, I understand that computing the tid of the target process in rr's pid namespace is a huge pain :-). I'm just saying that we're already storing that in the trace in RawDataMetadata::rec_tid so we don't need to add anything to the trace format.

rocallahan avatar Aug 02 '18 21:08 rocallahan

Yes, I think I agree, since the ProcMemMonitor doesn't actually do anything during replay, so it doesn't need to now what the target is.

Keno avatar Aug 02 '18 21:08 Keno

I think, in Linux 4.1+ at least, we can compute the (pid-namespace-of-process, pid-of-process-in-its-pid-namespace) pair for any process visible in any procfs by reading /proc/<pid>/ns/pid and using the NSpid field of /proc/<pid>/status and those should be immutable for the lifetime of a process. So rr can build a table of those for the tasks under its control, and given a path into any procfs we can look them up to find the rr process. I think that ends up being slightly simpler than what you suggested because we don't care about the pid namespace of the procfs mount. And I'm pretty comfortable with an approach that means you need 4.1+ to properly support procfs mounts that aren't using the rr pid namespace.

rocallahan avatar Aug 02 '18 21:08 rocallahan

Maybe, I'm misunderstanding something, but the pid in the namespace of the procfs mount, not the pid of the process in its own namespace. We may have never seen the namespace of the procfs mount (e.g. via nsenter).

Keno avatar Aug 02 '18 21:08 Keno

Maybe, I'm misunderstanding something, but the pid in the namespace of the procfs mount, not the pid of the process in its own namespace.

I don't understand what you're saying here...

rocallahan avatar Aug 02 '18 21:08 rocallahan

From man proc:

* NStgid : Thread group ID (i.e., PID) in each of the PID namespaces of which [pid] is a member.  The leftmost entry shows the value with respect
  to the PID namespace of the reading process, followed by the value in successively nested inner namespaces.  (Since Linux 4.1.)

* NSpid: Thread ID in each of the PID namespaces of which [pid] is a member.  The fields are ordered as for NStgid.  (Since Linux 4.1.)

rocallahan avatar Aug 02 '18 21:08 rocallahan

So the rightmost value in NSpid gives you the pid of the process in its own namespace, regardless of which process is doing the reading or how the procfs was mounted.

rocallahan avatar Aug 02 '18 21:08 rocallahan

Ok, I think I see what you're saying. You're saying that we can look at the procfs that the child accessed (though this itself might be nontrivial, and require going trough /proc/<tid>/root in the topmost namespace) and compute the id of the inner most pid namespace, and the tgid of the thread in that namespace from there, and then as long as we keep the id of the inner most namespace, we can find it. Sound right?

Keno avatar Aug 02 '18 21:08 Keno

(though this itself might be nontrivial, and require going trough /proc//root in the topmost namespace)

Yes. Given we can't get the information we need just from the file descriptor that the child already opened, I don't think there's any way around this sort of thing. (Well, I guess we could AutoRemoteSyscalls the child to open things for us, if that helped, at a small performance hit.) You'd have to do something very similar to open /proc/1/ns/pid in the procfs the tracee is using.

Sound right?

Yeah I think so. The key point is that "(pid-namespace-of-process, pid-of-process-in-its-pid-namespace)" is a unique key for each process that's immutable for the process's lifetime.

rocallahan avatar Aug 02 '18 23:08 rocallahan

Hang on. NStgid's leftmost entry is supposed to be the pid in the namespace of the reading process. So I think rr can just replace mem with status in the result of Task::file_name_of_fd and read the NStgid field.

rocallahan avatar Aug 02 '18 23:08 rocallahan

... prepending /proc/<tracee-tid>/root as needed.

rocallahan avatar Aug 02 '18 23:08 rocallahan

Hang on. NStgid's leftmost entry is supposed to be the pid in the namespace of the reading process.

Yeah, whoever wrote that man page didn't understand the code properly. I read the kernel code and it starts at the namespace of the process that mounted the procfs.

Keno avatar Aug 02 '18 23:08 Keno

Aha, oops!

rocallahan avatar Aug 02 '18 23:08 rocallahan

I'll submit a patch to man-pages to clear it up.

Keno avatar Aug 02 '18 23:08 Keno

Any update for this interesting issue? Possibly manpage updates and even more relevant a change to this issue, possibly with last release? If not, possibly close with a no-activity marker?

GitMensch avatar Jul 16 '21 19:07 GitMensch

I updated the man page in https://github.com/mkerrisk/man-pages/commit/0aae409212db2668945afec2da783e60c80e1b21. I believe the rr issue itself is still outstanding.

Keno avatar Jul 16 '21 20:07 Keno

I believe the rr issue itself is still outstanding.

That looks quite identical (using rr master from 2 days ago) and I guess it then is still outstanding (or is it something completely different because the first output references futex while this one references nanosleep?):

rr rerun
[ERROR /tmp/rr-master/src/Registers.cc:295:maybe_print_reg_mismatch()] r10 0x32ed != 0xfca5 (replaying vs. recorded)
[FATAL /tmp/rr-master/src/Registers.cc:405:compare_register_files()]
 (task 28998 (rec:14148) at time 3447)
 -> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:8145314/8145314)
Tail of trace dump:
{
  real_time:700671.018539 global_time:3427, event:`SYSCALLBUF_RESET' tid:14148, ticks:8143242
}
{
  real_time:700671.018669 global_time:3428, event:`SYSCALL: nanosleep' (state:EXITING_SYSCALL) tid:14148, ticks:8143242
rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffc4beb85c0 rbp:0x23 rsp:0x681ffde0 r8:0x7f5c1ea7b430 r9:0x7f5c1ea673e0 r10:0xd2 r11:0x246 r12:0x4 r13:0x30 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f5c2c717740 gs_base:0x0
}
{
  real_time:700671.018868 global_time:3429, event:`SYSCALLBUF_FLUSH' tid:14148, ticks:8143356
  { syscall:'gettimeofday', ret:0x0, size:0x28 }
}
{
  real_time:700671.018872 global_time:3430, event:`INSTRUCTION_TRAP' tid:14148, ticks:8143356
rax:0x339df1b4 rbx:0x1ce6c30 rcx:0x4 rdx:0x7734e rsi:0x4 rdi:0x1ce6c30 rbp:0x1d0001c rsp:0x7ffc4beb8608 r8:0x7ffc4beb86ec r9:0x7f5c1ea67440 r10:0x1 r11:0x7f5c2b26b9b0 r12:0x4 r13:0x1cffa00 r14:0x4 r15:0xd rip:0x7f5c1e53b282 eflags:0x10202 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f5c2c717740 gs_base:0x0
}
{
  real_time:700671.018879 global_time:3431, event:`SYSCALLBUF_RESET' tid:14148, ticks:8143356
}
{
  real_time:700671.019030 global_time:3432, event:`SYSCALLBUF_FLUSH' tid:14148, ticks:8143807
  { syscall:'open', ret:0x8, size:0x10, desched:1 }
  { syscall:'readlinkat', ret:0xb, size:0x1b }
  { syscall:'read', ret:0x1, size:0x11, desched:1 }
  { syscall:'close', ret:0x0, size:0x10 }
}
{
  real_time:700671.019041 global_time:3433, event:`SYSCALL: nanosleep' (state:ENTERING_SYSCALL) tid:14148, ticks:8143807
rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffc4beb85c0 rbp:0x23 rsp:0x681ffde0 r8:0x7f5c1ea7b430 r9:0x7f5c1ea673e0 r10:0xa4 r11:0x246 r12:0x4 r13:0x20 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f5c2c717740 gs_base:0x0
}
{
  real_time:700671.019048 global_time:3434, event:`SYSCALLBUF_RESET' tid:14148, ticks:8143807
}
{
  real_time:700671.019175 global_time:3435, event:`SYSCALL: nanosleep' (state:EXITING_SYSCALL) tid:14148, ticks:8143807
rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffc4beb85c0 rbp:0x23 rsp:0x681ffde0 r8:0x7f5c1ea7b430 r9:0x7f5c1ea673e0 r10:0xa4 r11:0x246 r12:0x4 r13:0x20 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f5c2c717740 gs_base:0x0
}
{
  real_time:700671.019359 global_time:3436, event:`SYSCALLBUF_FLUSH' tid:14148, ticks:8143923
  { syscall:'gettimeofday', ret:0x0, size:0x28 }
}
{
  real_time:700671.019363 global_time:3437, event:`INSTRUCTION_TRAP' tid:14148, ticks:8143923
rax:0x33b46a74 rbx:0x1ce6c30 rcx:0x4 rdx:0x7734e rsi:0x4 rdi:0x1ce6c30 rbp:0x1d0002c rsp:0x7ffc4beb8608 r8:0x7ffc4beb86ec r9:0x7f5c1ea67440 r10:0x2 r11:0x7f5c2b26b9b0 r12:0x4 r13:0x1cffa00 r14:0x4 r15:0xd rip:0x7f5c1e53b282 eflags:0x10202 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f5c2c717740 gs_base:0x0
}
{
  real_time:700671.019369 global_time:3438, event:`SYSCALLBUF_RESET' tid:14148, ticks:8143923
}
{
  real_time:700671.019507 global_time:3439, event:`SYSCALLBUF_FLUSH' tid:14148, ticks:8144383
  { syscall:'open', ret:0x8, size:0x10, desched:1 }
  { syscall:'readlinkat', ret:0xb, size:0x1b }
  { syscall:'read', ret:0x1, size:0x11, desched:1 }
  { syscall:'close', ret:0x0, size:0x10 }
}
{
  real_time:700671.019518 global_time:3440, event:`SYSCALL: nanosleep' (state:ENTERING_SYSCALL) tid:14148, ticks:8144383
rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffc4beb85c0 rbp:0x23 rsp:0x681ffde0 r8:0x7f5c1ea7b430 r9:0x7f5c1ea673e0 r10:0x76 r11:0x246 r12:0x4 r13:0x10 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f5c2c717740 gs_base:0x0
}
{
  real_time:700671.019524 global_time:3441, event:`SYSCALLBUF_RESET' tid:14148, ticks:8144383
}
{
  real_time:700671.019650 global_time:3442, event:`SYSCALL: nanosleep' (state:EXITING_SYSCALL) tid:14148, ticks:8144383
rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffc4beb85c0 rbp:0x23 rsp:0x681ffde0 r8:0x7f5c1ea7b430 r9:0x7f5c1ea673e0 r10:0x76 r11:0x246 r12:0x4 r13:0x10 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f5c2c717740 gs_base:0x0
}
{
  real_time:700671.019834 global_time:3443, event:`SYSCALLBUF_FLUSH' tid:14148, ticks:8144499
  { syscall:'gettimeofday', ret:0x0, size:0x28 }
}
{
  real_time:700671.019837 global_time:3444, event:`INSTRUCTION_TRAP' tid:14148, ticks:8144499
rax:0x33ca1786 rbx:0x1ce6c30 rcx:0x4 rdx:0x7734e rsi:0x4 rdi:0x1ce6c30 rbp:0x1d0003c rsp:0x7ffc4beb8608 r8:0x7ffc4beb86ec r9:0x7f5c1ea67440 r10:0x3 r11:0x7f5c2b26b9b0 r12:0x4 r13:0x1cffa00 r14:0x4 r15:0xd rip:0x7f5c1e53b282 eflags:0x10202 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f5c2c717740 gs_base:0x0
}
{
  real_time:700671.019844 global_time:3445, event:`SYSCALLBUF_RESET' tid:14148, ticks:8144499
}
{
  real_time:700671.019991 global_time:3446, event:`SYSCALLBUF_FLUSH' tid:14148, ticks:8145314
  { syscall:'open', ret:0x8, size:0x10, desched:1 }
  { syscall:'readlinkat', ret:0xb, size:0x1b }
  { syscall:'read', ret:0x1, size:0x11, desched:1 }
  { syscall:'close', ret:0x0, size:0x10 }
}
{
  real_time:700671.020010 global_time:3447, event:`SYSCALL: nanosleep' (state:ENTERING_SYSCALL) tid:14148, ticks:8145314
rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffc4beb85c0 rbp:0x23 rsp:0x681ffde0 r8:0x7f5c1ea7b430 r9:0x7f5c1ea673e0 r10:0xfca5 r11:0x246 r12:0x4 r13:0x40 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f5c2c717740 gs_base:0x0
}
{
  real_time:700671.020017 global_time:3448, event:`SYSCALLBUF_RESET' tid:14148, ticks:8145314
}
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x36)[0x969acf]
rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0x168)[0x7bc696]
rr[0x7eafeb]
rr(_ZN2rr21EmergencyDebugOstreamD2Ev+0x5a)[0x7eb1d6]
rr(_ZN2rr9Registers22compare_register_filesEPNS_10ReplayTaskEPKcRKS0_S4_S6_NS_16MismatchBehaviorE+0x12a)[0x8b2d48]
rr(_ZN2rr10ReplayTask13validate_regsEj+0x1d1)[0x8d520f]
rr[0x8bd3cf]
rr(_ZN2rr13ReplaySession13enter_syscallEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0x562)[0x8bd934]
rr(_ZN2rr13ReplaySession18try_one_trace_stepEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0xd3)[0x8c1bff]
rr(_ZN2rr13ReplaySession11replay_stepERKNS0_15StepConstraintsE+0x1ca)[0x8c2fdc]
rr(_ZN2rr13ReplaySession11replay_stepENS_10RunCommandE+0x3c)[0x8b9a54]
rr[0x8efeed]
rr(_ZN2rr12RerunCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x170)[0x8f04d2]
rr(main+0x22a)[0x982e9e]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fc16cd00555]
rr[0x725f39]
=== End rr backtrace

Is there anything that I can do to help investigating this issue?

The backtrace in the emergency debugger is more or less as expected:

(gdb) bt 10
#0  0x0000000070000002 in syscall_traced ()
#1  0x00007f5c2c3014c4 in _raw_syscall () at /tmp/rr-master/src/preload/raw_syscall.S:120
#2  0x00007f5c2c2fd27e in traced_raw_syscall (call=call@entry=0x681fffa0)
    at /tmp/rr-master/src/preload/syscallbuf.c:278
#3  0x00007f5c2c3005e1 in syscall_hook_internal (call=0x681fffa0) at /tmp/rr-master/src/preload/syscallbuf.c:3414
#4  syscall_hook (call=0x681fffa0) at /tmp/rr-master/src/preload/syscallbuf.c:3448
#5  0x00007f5c2c2fd060 in _syscall_hook_trampoline () at /tmp/rr-master/src/preload/syscall_hook.S:313
#6  0x00007f5c2c2fd0bf in __morestack () at /tmp/rr-master/src/preload/syscall_hook.S:458
#7  0x00007f5c2c2fd0c6 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff ()
    at /tmp/rr-master/src/preload/syscall_hook.S:472
#8  0x00007f5c2b1a3846 in __nanosleep_nocancel () from /usr/lib64/libc.so.6
#9  0x00007f5c2b1d4134 in usleep () from /usr/lib64/libc.so.6
(More stack frames follow...)

GitMensch avatar Jul 26 '21 08:07 GitMensch

Rechecked with current master rr (today: from two days ago) and GDB 11.1, that is still the case.

To have a recent dump:

~> PATH=/opt/gdb/bin:$PATH LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/usr/local/lib64:/usr/lib64:$LD_LIBRARY_PATH /opt/rr/bin/rr rerun
[ERROR /home/user/install/rr/src/Registers.cc:295:maybe_print_reg_mismatch()] r10 0x76b4 != 0x9290 (replaying vs. recorded)
[FATAL /home/user/install/rr/src/Registers.cc:405:compare_register_files()]
 (task 8593 (rec:6696) at time 2179)
 -> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:7089993/7089993)
Tail of trace dump:
{
  real_time:6041781.903039 global_time:2159, event:`SYSCALLBUF_RESET' tid:6696, ticks:7087908
}
{
  real_time:6041781.903177 global_time:2160, event:`SYSCALL: nanosleep' (state:EXITING_SYSCALL) tid:6696, ticks:7087908
rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffd49360420 rbp:0x23 rsp:0x681ffde0 r8:0x7f23af9c1430 r9:0x7f23af9ad3e0 r10:0xd2 r11:0x246 r12:0x4 r13:0x30 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f23bd466740 gs_base:0x0
}
{
  real_time:6041781.903397 global_time:2161, event:`SYSCALLBUF_FLUSH' tid:6696, ticks:7088023
  { syscall:'gettimeofday', ret:0x0, size:0x28 }
}
{
  real_time:6041781.903402 global_time:2162, event:`INSTRUCTION_TRAP' tid:6696, ticks:7088023
rax:0x6227d9c6 rbx:0x1c3bd50 rcx:0x4 rdx:0x403e3e rsi:0x4 rdi:0x1c3bd50 rbp:0x1c7b4fc rsp:0x7ffd49360468 r8:0x7ffd4936054c r9:0x7f23af9ad440 r10:0x1 r11:0x7f23bbfad9b0 r12:0x4 r13:0x1c7aee0 r14:0x4 r15:0xd rip:0x7f23af481282 eflags:0x10202 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f23bd466740 gs_base:0x0
}
{
  real_time:6041781.903410 global_time:2163, event:`SYSCALLBUF_RESET' tid:6696, ticks:7088023
}
{
  real_time:6041781.903560 global_time:2164, event:`SYSCALLBUF_FLUSH' tid:6696, ticks:7088483
  { syscall:'open', ret:0x8, size:0x10, desched:1 }
  { syscall:'readlinkat', ret:0xb, size:0x1b }
  { syscall:'read', ret:0x1, size:0x11, desched:1 }
  { syscall:'close', ret:0x0, size:0x10 }
}
{
  real_time:6041781.903572 global_time:2165, event:`SYSCALL: nanosleep' (state:ENTERING_SYSCALL) tid:6696, ticks:7088483
rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffd49360420 rbp:0x23 rsp:0x681ffde0 r8:0x7f23af9c1430 r9:0x7f23af9ad3e0 r10:0xa4 r11:0x246 r12:0x4 r13:0x20 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f23bd466740 gs_base:0x0
}
{
  real_time:6041781.903579 global_time:2166, event:`SYSCALLBUF_RESET' tid:6696, ticks:7088483
}
{
  real_time:6041781.903730 global_time:2167, event:`SYSCALL: nanosleep' (state:EXITING_SYSCALL) tid:6696, ticks:7088483
rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffd49360420 rbp:0x23 rsp:0x681ffde0 r8:0x7f23af9c1430 r9:0x7f23af9ad3e0 r10:0xa4 r11:0x246 r12:0x4 r13:0x20 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f23bd466740 gs_base:0x0
}
{
  real_time:6041781.903993 global_time:2168, event:`SYSCALLBUF_FLUSH' tid:6696, ticks:7088604
  { syscall:'gettimeofday', ret:0x0, size:0x28 }
}
{
  real_time:6041781.904000 global_time:2169, event:`INSTRUCTION_TRAP' tid:6696, ticks:7088604
rax:0x624307b6 rbx:0x1c3bd50 rcx:0x4 rdx:0x403e3e rsi:0x4 rdi:0x1c3bd50 rbp:0x1c7b50c rsp:0x7ffd49360468 r8:0x7ffd4936054c r9:0x7f23af9ad440 r10:0x2 r11:0x7f23bbfad9b0 r12:0x4 r13:0x1c7aee0 r14:0x4 r15:0xd rip:0x7f23af481282 eflags:0x10202 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f23bd466740 gs_base:0x0
}
{
  real_time:6041781.904020 global_time:2170, event:`SYSCALLBUF_RESET' tid:6696, ticks:7088604
}
{
  real_time:6041781.904176 global_time:2171, event:`SYSCALLBUF_FLUSH' tid:6696, ticks:7089064
  { syscall:'open', ret:0x8, size:0x10, desched:1 }
  { syscall:'readlinkat', ret:0xb, size:0x1b }
  { syscall:'read', ret:0x1, size:0x11, desched:1 }
  { syscall:'close', ret:0x0, size:0x10 }
}
{
  real_time:6041781.904188 global_time:2172, event:`SYSCALL: nanosleep' (state:ENTERING_SYSCALL) tid:6696, ticks:7089064
rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffd49360420 rbp:0x23 rsp:0x681ffde0 r8:0x7f23af9c1430 r9:0x7f23af9ad3e0 r10:0x76 r11:0x246 r12:0x4 r13:0x10 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f23bd466740 gs_base:0x0
}
{
  real_time:6041781.904195 global_time:2173, event:`SYSCALLBUF_RESET' tid:6696, ticks:7089064
}
{
  real_time:6041781.904328 global_time:2174, event:`SYSCALL: nanosleep' (state:EXITING_SYSCALL) tid:6696, ticks:7089064
rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffd49360420 rbp:0x23 rsp:0x681ffde0 r8:0x7f23af9c1430 r9:0x7f23af9ad3e0 r10:0x76 r11:0x246 r12:0x4 r13:0x10 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f23bd466740 gs_base:0x0
}
{
  real_time:6041781.904521 global_time:2175, event:`SYSCALLBUF_FLUSH' tid:6696, ticks:7089178
  { syscall:'gettimeofday', ret:0x0, size:0x28 }
}
{
  real_time:6041781.904526 global_time:2176, event:`INSTRUCTION_TRAP' tid:6696, ticks:7089178
rax:0x625b3c66 rbx:0x1c3bd50 rcx:0x4 rdx:0x403e3e rsi:0x4 rdi:0x1c3bd50 rbp:0x1c7b51c rsp:0x7ffd49360468 r8:0x7ffd4936054c r9:0x7f23af9ad440 r10:0x3 r11:0x7f23bbfad9b0 r12:0x4 r13:0x1c7aee0 r14:0x4 r15:0xd rip:0x7f23af481282 eflags:0x10202 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f23bd466740 gs_base:0x0
}
{
  real_time:6041781.904533 global_time:2177, event:`SYSCALLBUF_RESET' tid:6696, ticks:7089178
}
{
  real_time:6041781.904689 global_time:2178, event:`SYSCALLBUF_FLUSH' tid:6696, ticks:7089993
  { syscall:'open', ret:0x8, size:0x10, desched:1 }
  { syscall:'readlinkat', ret:0xb, size:0x1b }
  { syscall:'read', ret:0x1, size:0x11, desched:1 }
  { syscall:'close', ret:0x0, size:0x10 }
}
{
  real_time:6041781.904701 global_time:2179, event:`SYSCALL: nanosleep' (state:ENTERING_SYSCALL) tid:6696, ticks:7089993
rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x7ffd49360420 rbp:0x23 rsp:0x681ffde0 r8:0x7f23af9c1430 r9:0x7f23af9ad3e0 r10:0x9290 r11:0x246 r12:0x4 r13:0x40 r14:0x4 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x23 fs_base:0x7f23bd466740 gs_base:0x0
}
{
  real_time:6041781.904711 global_time:2180, event:`SYSCALLBUF_RESET' tid:6696, ticks:7089993
}
=== Start rr backtrace:
/opt/rr/bin/rr(_ZN2rr13dump_rr_stackEv+0x36)[0x97b51f]
/opt/rr/bin/rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0x168)[0x7c60b2]
/opt/rr/bin/rr[0x7f885f]
/opt/rr/bin/rr(_ZN2rr21EmergencyDebugOstreamD2Ev+0x5a)[0x7f8a4a]
/opt/rr/bin/rr(_ZN2rr9Registers22compare_register_filesEPNS_10ReplayTaskEPKcRKS0_S4_S6_NS_16MismatchBehaviorE+0x12a)[0x8c19c6]
/opt/rr/bin/rr(_ZN2rr10ReplayTask13validate_regsEj+0x1d1)[0x8e48c3]
/opt/rr/bin/rr[0x8cbff1]
/opt/rr/bin/rr(_ZN2rr13ReplaySession13enter_syscallEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0x562)[0x8cc556]
/opt/rr/bin/rr(_ZN2rr13ReplaySession18try_one_trace_stepEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0xd3)[0x8d0821]
/opt/rr/bin/rr(_ZN2rr13ReplaySession11replay_stepERKNS0_15StepConstraintsE+0x1ca)[0x8d1bfe]
/opt/rr/bin/rr(_ZN2rr13ReplaySession11replay_stepENS_10RunCommandE+0x3c)[0x8c86d2]
/opt/rr/bin/rr[0x8ff75e]
/opt/rr/bin/rr(_ZN2rr12RerunCommand12run_internalERNS_20CommandForCheckpointE+0x190)[0x8ffed0]
/opt/rr/bin/rr(_ZN2rr12RerunCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x72)[0x8fffba]
/opt/rr/bin/rr(main+0x28a)[0x99485f]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ff040d63555]
/opt/rr/bin/rr[0x72ee59]
=== End rr backtrace

And emergency debugger:

Reading symbols from /opt/progs/bin/prog...
Remote debugging using 127.0.0.1:8593
Reading symbols from /opt/rr/bin/../lib64/rr/librrpreload.so...
Reading symbols from /opt/progs/lib/liblib.so.3...
Reading symbols from /usr/lib64/libm.so.6...
(No debugging symbols found in /usr/lib64/libm.so.6)
Reading symbols from /usr/lib64/libgmp.so.10...
Reading symbols from .gnu_debugdata for /usr/lib64/libgmp.so.10...
(No debugging symbols found in .gnu_debugdata for /usr/lib64/libgmp.so.10)
Reading symbols from /usr/lib64/libncursesw.so.5...
Reading symbols from .gnu_debugdata for /usr/lib64/libncursesw.so.5...
(No debugging symbols found in .gnu_debugdata for /usr/lib64/libncursesw.so.5)
Reading symbols from /usr/lib64/libtinfo.so.5...
Reading symbols from .gnu_debugdata for /usr/lib64/libtinfo.so.5...
(No debugging symbols found in .gnu_debugdata for /usr/lib64/libtinfo.so.5)
Reading symbols from /usr/lib64/libdl.so.2...
(No debugging symbols found in /usr/lib64/libdl.so.2)
Reading symbols from /usr/lib64/libc.so.6...
(No debugging symbols found in /usr/lib64/libc.so.6)
Reading symbols from /usr/lib64/libpthread.so.0...
(No debugging symbols found in /usr/lib64/libpthread.so.0)
Reading symbols from /lib64/ld-linux-x86-64.so.2...
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
Reading symbols from /usr/lib64/libnsl.so.1...
(No debugging symbols found in /usr/lib64/libnsl.so.1)
Reading symbols from /usr/lib64/librt.so.1...
(No debugging symbols found in /usr/lib64/librt.so.1)
Reading symbols from /usr/lib64/libaio.so.1...
Reading symbols from .gnu_debugdata for /usr/lib64/libaio.so.1...
(No debugging symbols found in .gnu_debugdata for /usr/lib64/libaio.so.1)
Reading symbols from /usr/lib64/libresolv.so.2...
(No debugging symbols found in /usr/lib64/libresolv.so.2)
Reading symbols from /usr/local/lib64/libgcc_s.so.1...
Reading symbols from /usr/lib64/libxml2.so.2...
Reading symbols from .gnu_debugdata for /usr/lib64/libxml2.so.2...
(No debugging symbols found in .gnu_debugdata for /usr/lib64/libxml2.so.2)
Reading symbols from /usr/lib64/libz.so.1...
Reading symbols from .gnu_debugdata for /usr/lib64/libz.so.1...
(No debugging symbols found in .gnu_debugdata for /usr/lib64/libz.so.1)
Reading symbols from /usr/lib64/liblzma.so.5...
Reading symbols from .gnu_debugdata for /usr/lib64/liblzma.so.5...
(No debugging symbols found in .gnu_debugdata for /usr/lib64/liblzma.so.5)
Reading symbols from /opt/progs/lib/libmd5.so...
(No debugging symbols found in /opt/progs/lib/libmd5.so)
Reading symbols from /usr/lib64/libnss_files.so.2...
(No debugging symbols found in /usr/lib64/libnss_files.so.2)
Reading symbols from /usr/lib64/libnss_sss.so.2...
Reading symbols from .gnu_debugdata for /usr/lib64/libnss_sss.so.2...
(No debugging symbols found in .gnu_debugdata for /usr/lib64/libnss_sss.so.2)
... lots of symbols read ...
Reading symbols from .gnu_debugdata for /usr/lib64/libnuma.so.1...
(No debugging symbols found in .gnu_debugdata for /usr/lib64/libnuma.so.1)
Reading symbols from /usr/lib64/libnss_dns.so.2...
(No debugging symbols found in /usr/lib64/libnss_dns.so.2)
BFD: warning: system-supplied DSO at 0x6fffd000 has a section extending past end of file
0x0000000070000002 in syscall_traced ()
(gdb) bt 10
#0  0x0000000070000002 in syscall_traced ()
#1  0x00007f23bd0424bb in _raw_syscall () at /home/user/install/rr/src/preload/raw_syscall.S:120
#2  0x00007f23bd03e27e in traced_raw_syscall (call=call@entry=0x681fffa0) at /home/user/install/rr/src/preload/syscallbuf.c:278
#3  0x00007f23bd04161a in syscall_hook_internal (call=0x681fffa0) at /home/user/install/rr/src/preload/syscallbuf.c:3420
#4  syscall_hook (call=0x681fffa0) at /home/user/install/rr/src/preload/syscallbuf.c:3454
#5  0x00007f23bd03e060 in _syscall_hook_trampoline () at /home/user/install/rr/src/preload/syscall_hook.S:313
#6  0x00007f23bd03e0bf in __morestack () at /home/user/install/rr/src/preload/syscall_hook.S:458
#7  0x00007f23bd03e0c6 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff () at /home/user/install/rr/src/preload/syscall_hook.S:472
#8  0x00007f23bbee5846 in __nanosleep_nocancel () from /usr/lib64/libc.so.6
#9  0x00007f23bbf16134 in usleep () from /usr/lib64/libc.so.6
(More stack frames follow...)

GitMensch avatar Dec 03 '21 15:12 GitMensch

I'm not surprised it's still broken since we haven't done anything here, but is there a test case for this somewhere?

khuey avatar Dec 05 '21 16:12 khuey

I'm not surprised it's still broken since we haven't done anything here

There was a commit from you some months ago which fixed another issue that had nanosleep in, I've hoped that this had also fixed this issue.

is there a test case for this somewhere?

Locally here: of course yes. nanosleep is called from the oracle client libraries (which do not even ship debug info [it is even worse with JDBC...]) and those are called from a bunch of software I can't share...

is there something I can do to help analyzing this further?

I'm still not even sure if this is the same issue as @Keno's or a different one.

GitMensch avatar Dec 05 '21 16:12 GitMensch

Keno's issue is a very specific problem, and this is a very general error message, so it would be best if you filed a new issue.

khuey avatar Dec 05 '21 16:12 khuey

Keno's issue is a very specific problem, and this is a very general error message, so it would be best if you filed a new issue.

Done - and hidden the big posts here that were found to be not matching the topic.

GitMensch avatar Dec 05 '21 16:12 GitMensch