Robert O'Callahan

Results 250 comments of Robert O'Callahan

Let's move forward with this! Can you rebase it? I promise I'll review it :-)

There appear to be two bugs here: * timeout during replay * crash during recording with ``` [FATAL /cache/build/dockerized-amdci4-8/julialang/rr/src/AutoRemoteSyscalls.cc:567:infallible_send_fd()] -> Assertion `child_fd >= 0' failed to hold. Failed to send...

For the crash, the rr stack is ``` #5 0x000056331c79a517 in rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=0x7ffc8580e9e0, __in_chrg=) at /cache/build/dockerized-amdci4-8/julialang/rr/src/log.cc:451 #6 0x000056331c6f7dbe in rr::AutoRemoteSyscalls::infallible_send_fd (this=0x7ffc8580ead0, our_fd=...) at /cache/build/dockerized-amdci4-8/julialang/rr/src/AutoRemoteSyscalls.cc:567 #7 0x000056331c864523 in rr::RecordTask::init_buffers_arch (this=0x56331d340a10) at...

This stack occurs in both 194 and 197.

Actually no, 197 is a different but similar-looking stack: ``` #5 0x000056245d7d3517 in rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=0x7ffd32ffc6f0, __in_chrg=) at /cache/build/dockerized-amdci4-9/julialang/rr/src/log.cc:451 #6 0x000056245d730dbe in rr::AutoRemoteSyscalls::infallible_send_fd (this=0x7ffd32ffda70, our_fd=...) at /cache/build/dockerized-amdci4-9/julialang/rr/src/AutoRemoteSyscalls.cc:567 #7 0x000056245d902f6c in rr::Session::create_shared_mmap...

We want three kinds of `AutoRemoteSyscall` functions: * Infallible functions that abort on any error. These should only be called during replay. * "Infallible" functions that abort on any error...

Actually I think the correct approach is to only have the latter two kinds of functions, but infallible functions abort if they get ESRCH during replay.

Functions like `send_fd` are a tricky case. I think we want them to return -ESRCH if the tracee died (and it's not a replay), but abort on any other error...

I'd argue that if we get -ESRCH the operation didn't really *fail*, instead it never happened at all :-).

Actually we have `AutoRemoteSyscalls::infallible_syscall_if_alive`. I like that name. We should make it abort if the tracee is dead and we're in replay, and incrementally migrate to it.