flame
flame copied to clipboard
Combine identical stack traces
I use flame for a ray tracer, which roughly correspond to
for i in 0..x {
for j in 0..y {
let _guard = flame::start_guard("compute color");
out[i][j] = compute_color(i, j);
}
}
The trouble is that I get x*y "compute color" frames, each taking 0.001% of running time. Is there a way to combine these in a single frame, aside from doing everything manually with flame::frames()
?
A quick way to do that would be to generate an output compatible with https://github.com/brendangregg/FlameGraph#2-fold-stacks, and flamegraph.pl
could be used to generate the graph. A native way to do that would be nice, though.
This works for me:
fn print_trace(s: flame::Span, prefix: &str) {
let my_prefix = format!("{};{}", prefix, s.name);
println!("{} {}", my_prefix, s.delta);
s.children.into_iter().for_each(|c| print_trace(c, &my_prefix));
}
for span in flame::threads().into_iter().flat_map(|t| t.spans.into_iter()) {
print_trace(span, "all");
}
I definitely plan on making a crate for these kind of utilities.
Also, I don't understand how your example code does what you want? How is it combining similar stacks?
It produces the same output as stackcollapse-*
, so I run cargo run | flamegraph > out.svg
. flamegraph
combines the frames if they have the same path. It takes a long time to run though, so combining the frames before sending them to flamegraph would be worth it. Maybe by recursively merging Span::children
if Span::name
are identical?
@mfr-itr This is what I have been using for the same purpose (except using flame to generate html).
use std::fs::File;
use std::mem;
fn write_flame() {
let mut spans = flame::threads().into_iter().next().unwrap().spans;
merge_spans(&mut spans);
flame::dump_html_custom(&mut File::create("flame-graph.html").unwrap(), &spans)
.unwrap();
}
fn merge_spans(spans: &mut Vec<flame::Span>) {
if spans.is_empty() {
return;
}
// Sort so spans to be merged are adjacent and spans with the most children are
// merged into to minimise allocations.
spans.sort_unstable_by(|s1, s2| {
let a = (&s1.name, s1.depth, usize::max_value() - s1.children.len());
let b = (&s2.name, s2.depth, usize::max_value() - s2.children.len());
a.cmp(&b)
});
// Copy children and sum delta from spans to be merged
let mut merge_targets = vec![0];
{
let mut spans_iter = spans.iter_mut().enumerate();
let (_, mut current) = spans_iter.next().unwrap();
for (i, span) in spans_iter {
if current.name == span.name && current.depth == span.depth {
current.delta += span.delta;
let mut children = mem::replace(&mut span.children, Vec::new());
current.children.extend(children.into_iter());
} else {
current = span;
merge_targets.push(i);
}
}
}
// Move merged spans to the front of the spans vector
for (target_i, ¤t_i) in merge_targets.iter().enumerate() {
spans.swap(target_i, current_i);
}
// Remove duplicate spans
spans.truncate(merge_targets.len());
// Merge children of the newly collapsed spans
for span in spans {
merge_spans(&mut span.children);
}
}
@ebarnard, thanks for working this out and for posting this here -- I was looking for exactly this!
One note on getting this to work for me: In the latest flame
releaser (0.2.2
), dump_html_custom()
is not yet made public in src/lib.rs
(i.e. the last line reads pub use html::dump_html;
). So I used the latest GitHub version on the master
branch, which does make it public:
https://github.com/TyOverby/flame/blob/3d07e2115d6ac5018fe7010afb288144b407e13b/src/lib.rs#L557
I.e. my Cargo.toml now lists the flame
dependency as:
flame = { git = "https://github.com/TyOverby/flame.git", branch = "master", optional = true }
This is essential for profiling probably any non-trivial program!
Thanks to @ebarnard for the script and @dlaehnemann for the fix; they work like a charm :) Can't wait for this to be implemented on the main branch.
Hi @tomtung,
the flame crate approach is very useful when flamegraphing specific parts of code. But having to explicitly mark functions for tracking can be prohibitive. Especially when working with crate dependencies that would require local checkouts of those crates, local modifications and than pointing you Cargo.toml to those local checkouts for the flamegraphing to include functions from those crates. Thus, I have since established a different approach for my rust code, using more general-purpose tools that allow to flamegraph everything: perf + flamegraph.pl
I wrote up a small gist about it, maybe it's useful: https://gist.github.com/dlaehnemann/df31787c41bd50c0fe223df07cf6eb89
cheers, david
@dlaehnemann Thanks for the tips! I happened to have already starred that gist just a few hours ago :) Btw, for my case rust-unmangle
didn't seem to be necessary when c++filt
was present, and I also needed stackcollapse-recursive.pl
to unclutter recursive calls.
I was using this crate (flame) to profile some algorithm implementation. With the stack merging script, I still found this crate very handy, since it gives me the freedom to logically annotate sections of my code that aren't necessarily individual frames on the stack, which helps me get the overall big picture more easily.
Jep, definitely. There are use cases for both. I did expect rust-unmangle
to become unnecesaary at some point, as Rust name mangling evolves, just still saw some function calls that needed it. So thanks for pointing out that you've seen a case with no effect.
And the stackcollapse-recursive.pl
sounds like a great addition.
Happy optimising! ;)