rr icon indicating copy to clipboard operation
rr copied to clipboard

Assertion failure in process_syscall_entry() during record

Open nuudlman opened this issue 1 year ago • 5 comments

Attempting to debug LLVM opt, crash only happens with a Bazel development build (-O3 -g -UNDEBUG -fno-omit-frame-pointer -fsanitize=address,undefined, LLVM commit 44718311dee486f1823876e8af9100afcc50041b); regular release binaries seem to be fine.

Can provide executable if needed.

Log:

[FATAL src/RecordSession.cc:1944:process_syscall_entry()]
 (task 123227 (rec:123227) at time 72777)
 -> Assertion `t->desched_rec() || is_rrcall_notify_syscall_hook_exit_syscall( t->regs().original_syscallno(), t->arch()) || t->ip() == t->vm() ->privileged_traced_syscall_ip() .increment_by_syscall_insn_length(t->arch())' failed to hold. Stashed signal pending on syscall entry when it shouldn't be: {signo:SIGSTOP,errno:SUCCESS,code:SI_TKILL}; regs={ ip:0x7b9e505de9db args:(0xefffffffffffffff,0,0,0,0,0) orig_syscall: 24 syscallno: -38 }; last_execution_resume=0x7b9e505de9db; sig ip=0x7b9e505de9db
Tail of trace dump:
{
  real_time:10557.434084 global_time:72757, event:`SYSCALL: lseek' (state:EXITING_SYSCALL) tid:123228, ticks:1275
rax:0x0 rbx:0x1000 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x3 rbp:0x7b9e44200e40 rsp:0x7b9e44200da0 r8:0xffffffffffffffff r9:0x0 r10:0x22 r11:0x246 r12:0x7b9e44200ef0 r13:0x7b9e4ffbcf5c r14:0x7b9e44200e20 r15:0x0 rip:0x7b9e4ff60215 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x8 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434110 global_time:72758, event:`SYSCALL: getdents64' (state:ENTERING_SYSCALL) tid:123228, ticks:1277
rax:0xffffffffffffffda rbx:0x1000 rcx:0xffffffffffffffff rdx:0x1000 rsi:0x7b9e44270000 rdi:0x3 rbp:0x7b9e44200e40 rsp:0x7b9e44200da0 r8:0xffffffffffffffff r9:0x0 r10:0x22 r11:0x246 r12:0x1 r13:0x2 r14:0x7b9e44200e20 r15:0x0 rip:0x7b9e4ff6026a eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd9 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434136 global_time:72759, event:`SYSCALL: getdents64' (state:EXITING_SYSCALL) tid:123228, ticks:1277
rax:0x50 rbx:0x1000 rcx:0xffffffffffffffff rdx:0x1000 rsi:0x7b9e44270000 rdi:0x3 rbp:0x7b9e44200e40 rsp:0x7b9e44200da0 r8:0xffffffffffffffff r9:0x0 r10:0x22 r11:0x246 r12:0x1 r13:0x2 r14:0x7b9e44200e20 r15:0x0 rip:0x7b9e4ff6026a eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd9 fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123228, addr:0x7b9e44270000, length:0x50 }
}
{
  real_time:10557.434167 global_time:72760, event:`SYSCALL: openat' (state:ENTERING_SYSCALL) tid:123228, ticks:1500
rax:0xffffffffffffffda rbx:0x0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x7b9e44200d20 rdi:0xffffffffffffff9c rbp:0x7b9e44200d20 rsp:0x7b9e44200c48 r8:0x1b0 r9:0x7b9e44200d20 r10:0x1b0 r11:0x246 r12:0x0 r13:0x0 r14:0x7b9e44200e48 r15:0x50 rip:0x7b9e4ff5f1e1 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434178 global_time:72761, event:`SYSCALL: sched_yield' (state:EXITING_SYSCALL) tid:123227, ticks:525773288
rax:0x0 rbx:0x7b9e5008bcc8 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0xefffffffffffffff rbp:0x7ffd2a7be590 rsp:0x7ffd2a7be4f8 r8:0x0 r9:0x0 r10:0x0 r11:0x246 r12:0x7ffd2a7be51c r13:0x7b9e44000000 r14:0x1 r15:0x201000 rip:0x7b9e505de9db eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x18 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434214 global_time:72762, event:`SYSCALL: sched_yield' (state:ENTERING_SYSCALL) tid:123227, ticks:525773289
rax:0xffffffffffffffda rbx:0x7b9e5008bcc8 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0xefffffffffffffff rbp:0x7ffd2a7be590 rsp:0x7ffd2a7be4f8 r8:0x0 r9:0x0 r10:0x0 r11:0x246 r12:0x7ffd2a7be51c r13:0x7b9e44000000 r14:0x1 r15:0x201000 rip:0x7b9e505de9db eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x18 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434259 global_time:72763, event:`SYSCALL: openat' (state:EXITING_SYSCALL) tid:123228, ticks:1500
rax:0x4 rbx:0x0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x7b9e44200d20 rdi:0xffffffffffffff9c rbp:0x7b9e44200d20 rsp:0x7b9e44200c48 r8:0x1b0 r9:0x7b9e44200d20 r10:0x1b0 r11:0x246 r12:0x0 r13:0x0 r14:0x7b9e44200e48 r15:0x50 rip:0x7b9e4ff5f1e1 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434308 global_time:72764, event:`PATCH_SYSCALL' tid:123228, ticks:1579
rax:0x0 rbx:0x0 rcx:0xffffffffffffffff rdx:0x1000 rsi:0x7b9e44270000 rdi:0x4 rbp:0x7b9e44200cc0 rsp:0x7b9e44200c78 r8:0x0 r9:0x7b9e44200d20 r10:0x1b0 r11:0x246 r12:0x0 r13:0x10000000 r14:0x7b9e44200e48 r15:0x0 rip:0x7b9e4ff5f1fd eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123228, addr:0x7b9e5066969c, length:0x5e }
  { tid:123228, addr:0x7b9e4ff5f1fd, length:0x8 }
}
{
  real_time:10557.434352 global_time:72765, event:`SYSCALLBUF_FLUSH' tid:123228, ticks:1659
  { syscall:'read', ret:0x50c, size:0x51c, desched:1 }
  { syscall:'read', ret:0x0, size:0x10, desched:1 }
}
{
  real_time:10557.434359 global_time:72766, event:`SYSCALL: close' (state:ENTERING_SYSCALL) tid:123228, ticks:1659
rax:0xffffffffffffffda rbx:0x0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x4 rbp:0x4 rsp:0x7b9e44200c98 r8:0x0 r9:0x7b9e44200d20 r10:0x1b0 r11:0x246 r12:0x1000 r13:0x10000000 r14:0x7b9e44200e48 r15:0x50c rip:0x7b9e4ff5f19e eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434362 global_time:72767, event:`SYSCALLBUF_RESET' tid:123228, ticks:1659
}
{
  real_time:10557.434387 global_time:72768, event:`SYSCALL: close' (state:EXITING_SYSCALL) tid:123228, ticks:1659
rax:0x0 rbx:0x0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0x4 rbp:0x4 rsp:0x7b9e44200c98 r8:0x0 r9:0x7b9e44200d20 r10:0x1b0 r11:0x246 r12:0x1000 r13:0x10000000 r14:0x7b9e44200e48 r15:0x50c rip:0x7b9e4ff5f19e eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3 fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123227, addr:0x7b9e505e350c, length:0x1 }
}
{
  real_time:10557.434412 global_time:72769, event:`SYSCALL: getdents64' (state:ENTERING_SYSCALL) tid:123228, ticks:4594
rax:0xffffffffffffffda rbx:0x1000 rcx:0xffffffffffffffff rdx:0x1000 rsi:0x7b9e44270000 rdi:0x3 rbp:0x7b9e44200e40 rsp:0x7b9e44200da0 r8:0x7b9e44271000 r9:0x1 r10:0x4e r11:0x246 r12:0x0 r13:0x2 r14:0x7b9e44200e20 r15:0x50 rip:0x7b9e4ff6026a eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd9 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434442 global_time:72770, event:`SYSCALL: getdents64' (state:EXITING_SYSCALL) tid:123228, ticks:4594
rax:0x0 rbx:0x1000 rcx:0xffffffffffffffff rdx:0x1000 rsi:0x7b9e44270000 rdi:0x3 rbp:0x7b9e44200e40 rsp:0x7b9e44200da0 r8:0x7b9e44271000 r9:0x1 r10:0x4e r11:0x246 r12:0x0 r13:0x2 r14:0x7b9e44200e20 r15:0x50 rip:0x7b9e4ff6026a eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd9 fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123228, addr:0x7b9e44270000, length:(nil) }
}
{
  real_time:10557.434486 global_time:72771, event:`PATCH_SYSCALL' tid:123228, ticks:4600
rax:0x65 rbx:0x1e15b rcx:0xffffffffffffffff rdx:0x0 rsi:0x1e15b rdi:0x10 rbp:0x7b9e44200ef0 rsp:0x7b9e44200da8 r8:0x0 r9:0x1 r10:0x0 r11:0x246 r12:0x7b9e44200ef0 r13:0x7b9e4426f008 r14:0x7b9e44200e40 r15:0x0 rip:0x7b9e4ff5f915 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123228, addr:0x7b9e506696fa, length:0x5e }
  { tid:123228, addr:0x7b9e4ff5f915, length:0xb }
}
{
  real_time:10557.434533 global_time:72772, event:`SYSCALL: ptrace' (state:ENTERING_SYSCALL) tid:123228, ticks:4610
rax:0xffffffffffffffda rbx:0x7b9e43ffffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x1e15b rdi:0x10 rbp:0x7b9e43fffef0 rsp:0x7b9e43fffdc0 r8:0x0 r9:0x1 r10:0x0 r11:0x246 r12:0x7b9e44200ef0 r13:0x7b9e4426f008 r14:0x7b9e44200e40 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434553 global_time:72773, event:`SYSCALL: ptrace' (state:EXITING_SYSCALL) tid:123228, ticks:4610
rax:0x0 rbx:0x7b9e43ffffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x1e15b rdi:0x10 rbp:0x7b9e43fffef0 rsp:0x7b9e43fffdc0 r8:0x0 r9:0x1 r10:0x0 r11:0x246 r12:0x7b9e44200ef0 r13:0x7b9e4426f008 r14:0x7b9e44200e40 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434598 global_time:72774, event:`PATCH_SYSCALL' tid:123228, ticks:4616
rax:0x3d rbx:0x1e15b rcx:0xffffffffffffffff rdx:0x40000000 rsi:0x7b9e44200dbc rdi:0x1e15b rbp:0x7b9e44200ef0 rsp:0x7b9e44200da8 r8:0x0 r9:0x1 r10:0x0 r11:0x246 r12:0x1e15b r13:0x7b9e44200dc0 r14:0x7b9e44200dbc r15:0x0 rip:0x7b9e4ff5f935 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7b9e505a6380 gs_base:0x0
  { tid:123228, addr:0x7b9e50669758, length:0x5e }
  { tid:123228, addr:0x7b9e4ff5f935, length:0xb }
}
{
  real_time:10557.434623 global_time:72775, event:`SYSCALL: wait4' (state:ENTERING_SYSCALL) tid:123228, ticks:4625
rax:0xffffffffffffffda rbx:0x7b9e43ffffa0 rcx:0xffffffffffffffff rdx:0x40000000 rsi:0x7b9e44200dbc rdi:0x1e15b rbp:0x7b9e43fffef0 rsp:0x7b9e43fffdc0 r8:0x0 r9:0x1 r10:0x0 r11:0x246 r12:0x1e15b r13:0x7b9e44200dc0 r14:0x7b9e44200dbc r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3d fs_base:0x7b9e505a6380 gs_base:0x0
}
{
  real_time:10557.434634 global_time:72776, event:`SYSCALL: sched_yield' (state:EXITING_SYSCALL) tid:123227, ticks:525773289
rax:0x0 rbx:0x7b9e5008bcc8 rcx:0xffffffffffffffff rdx:0x0 rsi:0x0 rdi:0xefffffffffffffff rbp:0x7ffd2a7be590 rsp:0x7ffd2a7be4f8 r8:0x0 r9:0x0 r10:0x0 r11:0x246 r12:0x7ffd2a7be51c r13:0x7b9e44000000 r14:0x1 r15:0x201000 rip:0x7b9e505de9db eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x18 fs_base:0x7b9e505a6380 gs_base:0x0
}
=== Start rr backtrace:
rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0x159)[0x5f50bbfab279]
rr(+0xda941)[0x5f50bbfbe941]
rr(+0xdb4c6)[0x5f50bbfbf4c6]
rr(_ZN2rr13RecordSession21process_syscall_entryEPNS_10RecordTaskEPNS0_9StepStateEPNS0_12RecordResultENS_13SupportedArchE+0x3a4)[0x5f50bbff3174]
rr(_ZN2rr13RecordSession29handle_seccomp_traced_syscallEPNS_10RecordTaskEPNS0_9StepStateEPNS0_12RecordResultEPb+0x3a9)[0x5f50bbfecc39]
rr(_ZN2rr13RecordSession19handle_ptrace_eventEPPNS_10RecordTaskEPNS0_9StepStateEPNS0_12RecordResultEPb+0x782)[0x5f50bbfeda42]
rr(_ZN2rr13RecordSession11record_stepEv+0x2bc)[0x5f50bbff96ac]
rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0xe8e)[0x5f50bbfe9dce]
rr(main+0x191)[0x5f50bbf36161]
/usr/lib/libc.so.6(+0x26010)[0x70f853426010]
/usr/lib/libc.so.6(__libc_start_main+0x8a)[0x70f8534260ca]
rr(_start+0x25)[0x5f50bbf378a5]
=== End rr backtrace

nuudlman avatar Apr 19 '24 01:04 nuudlman

Looks like this is a bug in our PTRACE_ATTACH emulation. Event 72772 is a PTRACE_ATTACH to tid 123227, then event 72776 is that process executing a syscall exit. Seems like we're not expecting the tracee to be in a syscall?

khuey avatar Apr 19 '24 03:04 khuey

We do handle PTRACE_ATTACH to a tracee blocked in a syscall, in general. Here's a test for that: f4ff8304a967964887d262765f73adc28f8d281a

rocallahan avatar May 17 '24 03:05 rocallahan

Can provide executable if needed.

Sure. Steps to reproduce would be useful.

rocallahan avatar May 17 '24 03:05 rocallahan

Please be warned that the executable is 5.3GB, and compiling it from source will take ~50GB of disk space (and a lot of time). I'm waiting on it to upload but meanwhile to reproduce on it on your own: git clone --depth=1 https://git.sr.ht/~nuudlman/bazel479k and then bazel build //:opt --config=asan --//:llvm=19.

To get the bug rr record bazel-bin/external/llvm-19/llvm/opt --help. I'm running on Zen 3 system and compiling with GCC 13.2.1 and 14.1.0

nuudlman avatar May 17 '24 04:05 nuudlman

The executable link: https://utexas.box.com/shared/static/cupt90r924jwo9l40c13ik4hxobgfv1o

nuudlman avatar May 17 '24 04:05 nuudlman

I downloaded that binary and ran it with rr record ./cupt90r924jwo9l40c13ik4hxobgfv1o --help. It recorded and replayed successfully. This is Ubuntu 24 LTS, 6.8.0-1008-aws kernel.

rocallahan avatar May 18 '24 11:05 rocallahan

The assertion is similar to the one in #3745... which I also can't reproduce. If you re-reproduce this bug with master, the assertion should include the pid that we think sent SIGSTOP. If you can identify that process that would be helpful.

rocallahan avatar May 18 '24 11:05 rocallahan

Im also using Arch, just like that other bug. Ill build master and see if I can get you the PID.

nuudlman avatar May 18 '24 14:05 nuudlman

I can't reproduce it with master. The arch version that does have the issue is 5.7.0

nuudlman avatar May 18 '24 15:05 nuudlman

do you want to try bisecting to find the rr change that fixed this?

rocallahan avatar May 18 '24 20:05 rocallahan

That's a good idea, I'll try it.

nuudlman avatar May 18 '24 20:05 nuudlman

I can't reproduce with rr 5.7.0 built from source, only with the Arch package. What's even stranger is that when I copy the build config that Arch uses, I still can't reproduce the issue. Since building from source seems to work, I'll try to get a trace of it and upload it here.

nuudlman avatar May 18 '24 21:05 nuudlman

Since this seems like a bug with the Arch package, I'll close this for now unless you want to investigate further.

nuudlman avatar May 18 '24 21:05 nuudlman

Can you file a Arch bug about it?

rocallahan avatar May 20 '24 06:05 rocallahan

PS thanks for checking this out.

rocallahan avatar May 20 '24 06:05 rocallahan