scalene
scalene copied to clipboard
potential enhancement: stack-awareness
@daniel-shields and I were both surprised that Scalene isn't stack-aware as @emeryberger confirms in #33. Daniel has a use case where a multiprocessing application using remote endpoints reports that all the wall-time is spent in synchronization primitives but not what stack is waiting on those primitives. Adding stack awareness would expand Scalene's applicability to use cases like Daniel's.
Please consider:
- Expanding https://raw.githubusercontent.com/plasma-umass/scalene/master/docs/images/profiler-comparison.png to include stack-awareness. It's an important dimension when choosing a profiler.
- Adding stack-awareness to Scalene
I realize the second thing isn't a small thing. I mainly wanted to capture it so that others might find the detail.
So, for a bit of context here-- This was a tradeoff that we made in the initial development of Scalene. Adding a flamegraph option could serve to alleviate that (I think it's feasible, we do have access to most of the information while stack walking), but outside of a flamegraph, and in the UI we initially want to provide, it's a bit tough to represent stacks. I think it's definitely possible that it's in the future of Scalene (and you're both welcome and encouraged to implement it yourself!) but its use-cases and ergonomics are both relatively limited.
If you make a PR I'll be happy to work with you to merge it in! If you want to do it, I'd advise starting here, since it is where we process CPU samples.
From a design perspective, I would recommend not storing this information unless the flamegraph is certainly going to be built. Extra overhead in a non-Flamegraph run is unacceptable.
Implementing a flamegraph for memory samples would be difficult, unwieldy, and useless. The cases where stack context is intimately related to memory consumption are few and far between, and unlike the cases in which stack context matters for CPU sampling, we just haven't seen any of them in our work. As such, if you're putting in a flamegraph option, I'd advise forcing a --cpu-only
run.
Thank you @sternj, that's a lot of nice context.
@daniel-shields, would a --cpu-only --flamegraph
-like option have given you the sort of information that you wanted for your use case? Given @sternj's context I'd like to make sure the feature in light of that context would have improved Scalene's utility to you.
Just a note: it is already possible to use Scalene for the specified use-case of finding who is waiting on a synchronization operation via a command-line option. As long as the waits are in a separate file from the rest of the code, you can simply --profile-exclude
the file in question and Scalene will - in this case - only report the parent callers. @sternj also proposed extending the --profile-exclude
syntax to support exclusion of specific functions (e.g., --profile-exclude somecode.py:wait_for_something
).
Thanks for this tool! It's very neat, and compliments py-spy (a low overhead sampling profile providing nice flamegraphs) very well.
One thing I would like to attach to the feature request, acknowledging that any implementation of this is probably quite a lot of extra work, is that the proposal of --cpu-only --flamegraph
might not be quite enough to improve upon the current situation of being able to running Scalene and then py-spy (apart from having less tools to install). There are couple of small extras which would push it into providing extra utility: one is being able to capture both a flamegraph profile and a full Scalene profile including (non-stack) memory information at the same time. This would be more convenient and provide extra assurance that the views are really views of the same data. Another is being able to get a GPU flamegraph which could be useful in some circumstances e.g. in deep learning seeing how much time is spent in training versus evaluation, which can both appear as inside the model code without stack awareness.
In the short term, it could be an idea to note that at the moment py-spy is a complimentary tool in the README.
Hi, was trying to get some kind of stack trace output similar yo py-spy and just saw this. Scalene does have --stacks:
--stacks collect stack traces
but I haven't seen the output change if this is set. What does --stacks
do and is it something different than what is being talked about here?
From searching the code, it seems that --stacks
has no effect.
I can see --stacks output in the JSON, but it's not clear to me how to use it