async-crashdump-debugging-initiative Out-of-band access to the logical stack traces at runtime

Out-of-band access to the logical stack traces at runtime

Open koute opened this issue 3 years ago • 1 comments

I'm the author of Bytehound, a memory profiler for Linux, and not-perf, a CPU profiler. One of the major issues I'd love to find a solution for is the problem of async stack traces. That is, it's very hard to profile highly async programs because the stack traces I gather are native stack traces, so in the profiling data you usually have a few useful stack traces at the top, and the rest is all of the executor machinery which is usually completely useless (unless you're profiling the executor itself, that is).

So I see that one of the goals in this WG's charter is:

creating logical stack traces that shows dependencies between tasks and resources

This is something I'd be highly interested in for my tools. So my questions are:

Are there plans to make this information also easily accessible at runtime? In theory if it's available to debuggers and can be extracted from a coredump it should also be possible to access it at runtime from a profiler hooked into the process, right?
How fast is the mechanism going to be, or could feasibly be made? Are you open to modifying the design to make it faster to tap into at runtime, even if it means increased complexity? For Bytehound the requirements for how performant the stack unwinding has to be are extreme, since we gather a backtrace on every allocation. This has required me to write a completely custom stack unwinding implementation which is orders of magnitude faster that what's usually used; if access to the logical stack trace can't be made at least as fast as that then it won't be usable for tooling like my Bytehound.
In the recent midyear report I can see this:

@mw thinks we made a lot of progress on the compiler side. rustc now encodes most of the information we need for implementing logical stack traces. Only the information about file/line of await points in not readily available.

Is there anywhere I can read about this in more detail?

Thanks!

Aug 09 '22 06:08 koute

Hi! Sorry I didn't see this earlier.

The TL;DR of the current state is that there is no clear, unified approach for async crashdump debugging yet -- mostly (in my opinion) because async Rust is largely a set of protocols with hardly any implementation details prescribed.

Every piece of middleware has lots of freedom in how things are represented internally, while at the same time it's currently not possible to provide a common debugging interface (e.g. via traits) because debuggers don't understand Rust and can't run code in the debuggee (especially when dealing with crashdumps).

The work mentioned in the mid-year report was mostly about fixing general debuginfo bugs and issues in the Rust compiler. The compiler does generate debuginfo now that contains the information we need for some basic things (see e.g. this proof-of-concept implementation). However, we have not found a way of making a debuginfo-based approach maintainable. What we have so far would require separate implementations for each debugger and there is not good strategy for regression testing.

However, it looks like you are more interested in something that works with a running process. @jswrenn has done a lot of work in this area and wrote a comparison of different approaches at https://hackmd.io/@jswrenn/SkldX98Ci. I suspect that any high-performance support for traces would have to be built into a given executor framework.

May 23 '23 09:05 michaelwoerister

async-crashdump-debugging-initiative async-crashdump-debugging-initiative copied to clipboard

Out-of-band access to the logical stack traces at runtime

async-crashdump-debugging-initiative
async-crashdump-debugging-initiative copied to clipboard