rr icon indicating copy to clipboard operation
rr copied to clipboard

Support for "catch syscall" or other means of breaking on system call entry and exit

Open pkmoore opened this issue 8 years ago • 7 comments

Right now attempting to catch a syscall during replay results in an error indicating that the feature is not supported. This means that the best a user can do is break execution on the libc wrappers that exist for some system calls. This workaround isn't sufficient for breaking on all system call entries and exits.

pkmoore avatar Jul 11 '17 20:07 pkmoore

FWIW gdb 7.11 introduced the QCatchSyscalls packet for the remote protocol, plus a couple of related "stop reasons".

tromey avatar Oct 18 '17 17:10 tromey

Yes, the remote protocol has supported this for a while. The tricky part from rr's perspective is making this work for buffered syscalls (which don't trigger a ptrace/seccomp trap and don't have a corresponding entry in our trace log to stop at). It could be done but it's not a priority for roc or me.

khuey avatar Oct 18 '17 19:10 khuey

This would be useful functionality to have, to catch on a write syscall for example, to identify where a program is writing its output.

@khuey or someone else, can you clarify what a buffered syscall is? What functionality would be missing if buffered syscalls are not handled?

pwaller avatar Jul 27 '19 21:07 pwaller

rr records the effects of syscalls on the tracee, so that it can recreate those effects during replay. In order to achieve good performance we can't take a ptrace trap for every syscall. Consider something like gettid(2), where the syscall itself is trivial. The overhead of the context switches involved in a ptrace trap to the rr supervisor and then later resuming the tracee dwarf the syscall itself.

To achieve good performance we buffer some syscalls inside the tracee. We have an LD_PRELOADed library that knows how to record the effects of certain syscalls in a memory buffer in the tracee, and from the rr supervisor we rewrite syscall instructions to instead call into this library. Then we can do some subset of syscalls without trapping to rr synchronously, instead deferring that until the memory buffer is full or until we end up in the supervisor for an unrelated reason.

This is a problem for implementing QCatchSyscalls because, for buffered syscalls, there is no corresponding trace event at the moment the syscall is executed. For unbuffered syscalls we have an event in the trace, rr plays forwards to that event, and instead of emulating the syscall and returning control to the tracee we can easily trap to the debugger. But for buffered syscalls we instead just refill the memory buffer and logic in the LD_PRELOADed library handles the replay of those syscalls from the buffer. There's no obvious point at which to stop and generate a syscall trap. This could be dealt with: it would involve setting breakpoints in this library to stop when we are taking one of these syscall replay paths.

If support for buffered syscalls were not present, then a large number of syscall invocations would be missed by QCatchSyscalls (including many write(2)s). Because misleading or incomplete information is often worse than no information when debugging, we wouldn't ship a QCatchSyscalls implementation that is broken in that way.

khuey avatar Jul 27 '19 21:07 khuey

Thanks so much for indulging me with a great explanation and rationale! :)

pwaller avatar Jul 28 '19 19:07 pwaller

If you (or someone else) were motivated to do this I don't think it would be that difficult. We already have a mechanism for breaking in the syscallbuf code, the main work here would be wiring things up and then writing some tests that it works.

khuey avatar Jul 29 '19 18:07 khuey

My apologies. I did not find the duplicate because, well, https://github.com/rr-debugger/rr/issues?q=catch+is%3Aopen does not return it!

pspacek avatar Sep 20 '22 13:09 pspacek