rr Running replay without counter enabled

Hey, we are trying to run rr replay on some cloud servers (to replay traces recorded as a part of our debugger environment): I expected naively that only record needs to have special support for perf counters enabled, but it seems that it's needed for replay as well: is this a hard requirement, or is it something that might have a workaround for (i guess that you can't really control the execution in replays as well without it )

May 20 '19 15:05 alehander92

There is no workaround for this in public rr.

Replaying without counters requires binary instrumentation to count the events. It's slower, but tolerable. We actually have a closed-source branch of rr that can do this. (It doesn't work with hardware data watchpoints in gdb currently, though that could probably be fixed.) If this is something you really really need and are willing to pay money for we could probably work something out.

May 20 '19 20:05 rocallahan

Hello! We're also interested in running rr in a cloud environment, specifically to determine which syscalls in a program are violating AWS Lambda's sandbox security.

Is there any chance to incorporate binary instrumentation into an open-source branch?

Thank you! :)

Nov 18 '21 19:11 stefan-pdx

Hello! We're also interested in running rr in a cloud environment, specifically to determine which syscalls in a program are violating AWS Lambda's sandbox security.

Interesting. I hope you don't want to run rr in the Lambda sandbox since that probably can't work.

Is there any chance to incorporate binary instrumentation into an open-source branch?

To be clear, our rr remix branch only supports replay in machines without perf counters. Recording in a machine without perf counters is quite a different problem and is not really in scope for rr.

Also, rr with performance counters does work in some AWS instances, e.g. for Pernosco our workhorse instance is c5d.9xlarge but some others work too.

Nov 18 '21 21:11 rocallahan

To be clear, our rr remix branch only supports replay in machines without perf counters.

Ah, ok -- I think that ought to work. May I try out that branch to see if replay works in the Lambda environment?

I hope you don't want to run rr in the Lambda sandbox since that probably can't work.

We've been able to run rr record in a local Docker container (on a Linux host) to record the trace then run that trace in Lambda to identify which syscall is causing the issue.

Thanks!

Nov 18 '21 21:11 stefan-pdx

May I try out that branch to see if replay works in the Lambda environment?

No, this is not available right now, sorry.

We've been able to run rr record in a local Docker container (on a Linux host) to record the trace then run that trace in Lambda to identify which syscall is causing the issue.

I'm not sure what you mean. During rr replay, almost no tracee syscalls actually run, so it would be very possible to have a replay that works in a sandbox even though the original program would not have worked.

Nov 18 '21 21:11 rocallahan

If you just need information about syscalls you can probably get that from rr dump somehow. I don't think you should need to actually execute the replay.

Nov 18 '21 21:11 khuey

@khuey: we're not sure which syscall is causing the process to terminate early. We were hoping that rr would be able to replay (execute) the syscalls in order (logging some sort of trace to stdout) to determine which call is causing the issue.

@rocallahan:

During rr replay, almost no tracee syscalls actually run, so it would be very possible to have a replay that works in a sandbox even though the original program would not have worked.

Ah, ok! I was under the impression that syscalls are executed.

Thank you for taking the time to explain!

Nov 18 '21 23:11 stefan-pdx

Ah, ok I understand what you mean now.

You might consult https://github.com/firecracker-microvm/firecracker/blob/main/resources/seccomp/x86_64-unknown-linux-musl.json and the documentation at https://github.com/firecracker-microvm/firecracker/blob/main/docs/seccomp.md. That plus rr dump after rr record -n of your program (or just regular strace output to be honest) might get you there.

Nov 19 '21 02:11 khuey