mimalloc icon indicating copy to clipboard operation
mimalloc copied to clipboard

USDTs on important functions

Open brancz opened this issue 8 months ago • 5 comments

Other popular allocators, such as jemalloc have built-in mechanisms for profiling; mimalloc does not. To allow some level of debugging, it would be great if mimalloc added USDTs in important places, such as at the end of malloc and free. USDTs have a very minimal impact when not being traced (inserts a single NOP instruction) and much lower impact than having to use uprobes/uretprobes.

Is this something that would be accepted into the mimalloc project?

brancz avatar Apr 08 '25 11:04 brancz

We would love to have more insight into mimalloc behaviour for our workloads and this seems like a great low-cost way to achieve that.

shikhar avatar Apr 09 '25 00:04 shikhar

@daanx with jemalloc being archived, it might start to be more important to make mimalloc more observable. Any chance this is something you'd be interested in accepting?

brancz avatar Jun 13 '25 16:06 brancz

Apologies for the late response. Yikes, I didn't know jemalloc was getting archived -- that's unexpected :-(
Anyways, I am quite interested in USDT's but I would have to study it a bit more -- if the overhead is indeed this low that would be quite interesting. At the moment, there is already quite some support for tracing with tools like valgrind and asan (see mimalloc-trace.h) and there has been integration with Windows perf counters as well. If we add USDT it would be good if we could reuse the same mechanism? Or at least impact the code as little as possible. I'll study this a bit more soon.

daanx avatar Jun 14 '25 19:06 daanx

At first sight I would say that should work. A cherry on top would be if mimalloc sampled allocations every X bytes allocated and only fired the trace probes then (or additionally to trace probes that always fire unconditionally). That way mimalloc doesn’t have to know how to unwind a stack and can offload that part to the profiler, but the profiler doesn’t have to trace every single allocation which would be cost prohibitive in a production environment (which is arguably exactly where you want to profile). That would give us everything we need to build external heap and allocation profilers with very low overhead.

brancz avatar Jun 15 '25 14:06 brancz

I'm also coming from the jemalloc-profiling world and would love to have some way to observe mimalloc allocation patterns under real-world workloads in production environments. This is a somehow common wishlist item when developing backend/infrastructure services (in my case, Rust binaries running on Linux servers). For completeness, tracing through valgrind/heaptrack is nice for local development but is not well suited for investigating memory usage-patterns of a remotely deployed binary.

I'm quite excited if this ends up going in the direction suggested by @brancz, which wouldn't even require active instrumentation in the consuming application and can be directly driven by an external profiler!

lucab avatar Jun 16 '25 18:06 lucab