flame
flame copied to clipboard
Doesn't seem to work with multiple threads
Minimal test case:
use rayon::prelude::*;
use flame;
use std:: fs::File;
fn something(i: usize) -> f64 {
flame::start("Something");
std::thread::sleep(std::time::Duration::from_millis(10));
flame::end("Something");
0.56
}
fn main() {
let a = (0..200).collect::<Vec<_>>()
.par_iter()
.map(|i| something(*i))
.collect::<Vec<_>>();
flame::dump_html(&mut File::create("flames.html").unwrap()).unwrap();
println!("{:?}", flame::threads());
}
Dependencies:
rayon = "1.0.3"
flame = "0.2.2"
The written file has no graphics.
The output is: [Thread { id: 140470522639296, name: Some("main"), spans: [], _priv: () }]
, so no data is available.
I would also love to see this work in multi-threaded environments.
I think the real problem here is that Rayon keeps those threads around in a pool, so the normal destructor-based solution doesn’t work. Try calling commit_thread
at the end of your something
function.
The commit_thread
idea doesn't seem to work.
Here's what I tried...
let pool = rayon::ThreadPoolBuilder::new().exit_handler(|_| flame::commit_thread()).build().unwrap();
pool.install(|| {
/* code which does rayon things */
});
drop(pool);
flame::commit_thread();
flame::dump_html(&mut File::create("flame-graph.html").unwrap()).unwrap();
This code should ensure that each thread both runs the destructor based solution because when the pool is dropped the threads are dropped as well, and it runs the commit_thread
explicitly for the exit of each thread in the exit_handler
. dump_html
just prints a single bar with the label "undefined."
What is necessary here is in the exit_handler
to also call dump_html
for each thread using a unique filename based on the thread id. Unfortunately, the output is quite difficult to parse because none of the data in the several different html files will be synchronized or scaled.
Shameless plug, but @Ploppz @l-x-u you can check out https://github.com/wagnerf42/rayon-logs
This shows you how rayon is splitting tasks and how much time is spent in each task. It however, does not have the flamegraph characteristics, in that it is not useful if you want callgraph information in multithreaded code. However, you can still add some hardware counters to parallel regions.