rr
rr copied to clipboard
Support for "catch syscall" or other means of breaking on system call entry and exit
Right now attempting to catch a syscall during replay results in an error indicating that the feature is not supported. This means that the best a user can do is break execution on the libc wrappers that exist for some system calls. This workaround isn't sufficient for breaking on all system call entries and exits.
FWIW gdb 7.11 introduced the QCatchSyscalls packet for the remote protocol, plus a couple of related "stop reasons".
Yes, the remote protocol has supported this for a while. The tricky part from rr's perspective is making this work for buffered syscalls (which don't trigger a ptrace/seccomp trap and don't have a corresponding entry in our trace log to stop at). It could be done but it's not a priority for roc or me.
This would be useful functionality to have, to catch on a write syscall for example, to identify where a program is writing its output.
@khuey or someone else, can you clarify what a buffered syscall is? What functionality would be missing if buffered syscalls are not handled?
rr records the effects of syscalls on the tracee, so that it can recreate those effects during replay. In order to achieve good performance we can't take a ptrace trap for every syscall. Consider something like gettid(2), where the syscall itself is trivial. The overhead of the context switches involved in a ptrace trap to the rr supervisor and then later resuming the tracee dwarf the syscall itself.
To achieve good performance we buffer some syscalls inside the tracee. We have an LD_PRELOADed library that knows how to record the effects of certain syscalls in a memory buffer in the tracee, and from the rr supervisor we rewrite syscall instructions to instead call into this library. Then we can do some subset of syscalls without trapping to rr synchronously, instead deferring that until the memory buffer is full or until we end up in the supervisor for an unrelated reason.
This is a problem for implementing QCatchSyscalls because, for buffered syscalls, there is no corresponding trace event at the moment the syscall is executed. For unbuffered syscalls we have an event in the trace, rr plays forwards to that event, and instead of emulating the syscall and returning control to the tracee we can easily trap to the debugger. But for buffered syscalls we instead just refill the memory buffer and logic in the LD_PRELOADed library handles the replay of those syscalls from the buffer. There's no obvious point at which to stop and generate a syscall trap. This could be dealt with: it would involve setting breakpoints in this library to stop when we are taking one of these syscall replay paths.
If support for buffered syscalls were not present, then a large number of syscall invocations would be missed by QCatchSyscalls (including many write(2)s). Because misleading or incomplete information is often worse than no information when debugging, we wouldn't ship a QCatchSyscalls implementation that is broken in that way.
Thanks so much for indulging me with a great explanation and rationale! :)
If you (or someone else) were motivated to do this I don't think it would be that difficult. We already have a mechanism for breaking in the syscallbuf code, the main work here would be wiring things up and then writing some tests that it works.
My apologies. I did not find the duplicate because, well, https://github.com/rr-debugger/rr/issues?q=catch+is%3Aopen does not return it!