FlameGraph
FlameGraph copied to clipboard
stack trace visualizer
Flame Graphs visualize hot-CPU code-paths.
Using DTrace, see: http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/ Using perf_events or SystemTap, see: http://dtrace.org/blogs/brendan/2012/03/17/linux-kernel-performance-flame-graphs/ Using XCode Instruments, see: http://schani.wordpress.com/2012/11/16/flame-graphs-for-instruments/
These can be created in three steps:
-
Capture stacks
-
Fold stacks
-
flamegraph.pl
-
Capture stacks
Stack samples can be captured using DTrace, perf_events or SystemTap.
Using DTrace to capture 60 seconds of kernel stacks at 997 Hertz:
dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); } tick-60s { exit(0); }' -o out.kern_stacks
Using DTrace to capture 60 seconds of user-level stacks for PID 12345 at 97 Hertz:
dtrace -x ustackframes=100 -n 'profile-97 /PID == 12345 && arg1/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks
Using DTrace to capture 60 seconds of user-level stacks, including while time is spent in the kernel, for PID 12345 at 97 Hertz:
dtrace -x ustackframes=100 -n 'profile-97 /PID == 12345/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks
Switch ustack() for jstack() if the application has a ustack helper to include translated frames (eg, node.js frames; see: http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/). The rate for user-level stack collection is deliberately slower than kernel, which is especially important when using jstack() as it performs additional work to translate frames.
- Fold stacks
Use the stackcollapse programs to fold stack samples into single lines. The programs provided are:
- stackcollapse.pl: for DTrace stacks
- stackcollapse-perf.pl: for perf_events "perf script" output
- stackcollapse-stap.pl: for SystemTap stacks
- stackcollapse-instruments.pl: for XCode Instruments
Usage example:
$ ./stackcollapse.pl out.kern_stacks > out.kern_folded
The output looks like this:
unix_sys_sysenter_post_swapgs 1401 unix_sys_sysenter_post_swapgs;genunixclose 5 unix_sys_sysenter_post_swapgs;genunixclose;genunixcloseandsetf 85
unix_sys_sysenter_post_swapgs;genunixclose;genunixcloseandsetf;c2auditaudit_closef 26
unix_sys_sysenter_post_swapgs;genunixclose;genunixcloseandsetf;c2auditaudit_setf 5
unix_sys_sysenter_post_swapgs;genunixclose;genunixcloseandsetf;genunixaudit_getstate 6
unix_sys_sysenter_post_swapgs;genunixclose;genunixcloseandsetf;genunixaudit_unfalloc 2
unix_sys_sysenter_post_swapgs;genunixclose;genunixcloseandsetf;genunixclosef 48
[...]
- flamegraph.pl
Use flamegraph.pl to render a SVG.
$ ./flamegraph.pl out.kern_folded > kernel.svg
An advantage of having the folded input file (and why this is separate to flamegraph.pl) is that you can use grep for functions of interest. Eg:
$ grep cpuid out.kern_folded | ./flamegraph.pl > cpuid.svg