profiler
profiler copied to clipboard
[Calltree Comparison (Talos)] Ability to compare multiple profiles
Sometimes you want to measure the performance impact of a change. You'll usually get a profile from before the change and a profile from after the change, and then you want some way of comparing the two.
At the moment, the only way of doing that is to load the profiles in two different tabs and to repeatedly switch between the two. That's not a great experience.
It would be nice to be able to select two profiles to compare, maybe pick one thread from each of them, and then have a few ways of comparing them:
- Difference tree: Let's say the profiling interval is 1ms. In the regular call tree, in order to compute the time for a call tree node, you just sum up all the samples under that call stack, and then you sort by that number, and the callstacks with the highest number show up at the top. In a difference tree, you could count all samples from the new profile as 1ms and all the samples from the old profile as -1ms. Then you still do the same summing of numbers in the call tree, and you end up with call tree nodes that can have negative or positive costs. Call tree nodes with positive numbers are call stacks which are present in the new profile more than in the old profile (i.e. the parts that got slower), and they'll be sorted to show up at the top.
- Matching up timelines: Sometimes it's nice to compare the information represented in the timeline views at the top with each other visually, both the stack graph and the interval markers. So you'd want some way of moving the time lines of the two threads underneath each other until they match up somehow.
- Comparing information contained in markers: If I have a change that affects composite times, I'd like to have some way of comparing the composite markers on the Compositor threads of the two profiles. E.g. compare the average time of a composite taken before and after.
Comparing profiles is a big subject and we'll probably need to experiment with a bunch of different views until we find something that works.
┆Issue is synchronized with this Jira Task
For comparing markers, I'd like to see statistical information about the markers reported (probably on a new pane) such as for duration of GCMinor events I'd like to see min, 25%, mean, 75%, 90%, max, for example. When comparing profiles I'd like to see box-and-whisker diagrams for these stats, possibly run t-tests or other statistical tests and so on. I can help with this.
Sounds like we need:
- A way to "import" one profile into another profile tab via URL or local file
- Perhaps also a way to display a history of the last n recorded profiles in the Profiler toolbar panel, which could allow you to select two profiles and click "Compare"?
- A way to show the two profiles' timelines in the header area, lined up with each other
- Ability to select and compare markers in each profile?
- Fine-grained comparison in the call stack (and possibly also other panels?)
I feel this can be made possible by ideas we were having about improving the header timeline by only showing one thread/activity row by default. We'd need to be able to:
- Auto-select the most interesting thread of the profile and display only its events
- There's also a need to automatically display pertinent markers/events from other threads
This feature is kind of intense so let me know if I seem to be misunderstanding anything 😅
Also, re: call stack comparison, would it be helpful/possible to display the two profiles in two columns like a diff tool, connecting/highlighting the parts that are different and using some green success color to indicate the side that's more performant?
This should also use #448
I feel this can be made possible by ideas we were having about improving the header timeline by only showing one thread/activity row by default.
My comment was for showing one thread for the Timeline panel, not the header. So I think we need a mechanism to select which threads we are interested in comparing.
Auto-select the most interesting thread of the profile and display only its events
I'm not sure how this would work or if it's feasible.
Also, re: call stack comparison, would it be helpful/possible to display the two profiles in two columns like a diff tool, connecting/highlighting the parts that are different and using some green success color to indicate the side that's more performant?
The problem here is that the stacks won't always line up. A top functions view would be much easier to do this with, and would be very nice. The call tree is dependent on the order that code is called. Often performance comparisons are about changing that order to make certain things faster.
Possibly what I would like to do would be to use the call tree and focus in on a part of the call tree, then do that for a second profile. Afterwards I would like to compare timing on top functions. I could see a diff-like approach for that.
@mstange had some thoughts on computing a diff call tree. I'm not sure if that's written down somewhere.
@violasong did I miss anything?
Here are some mockups - let me know if you have any feedback! Also, do any of the other panels need design work for this?
Importing a profile into another profile:
Profile comparison with difference tree:
Need to file break out issues for this.
Suggest starting with the Call Tree comparison view and waiting on the "top thing"
Not planned in 2017 in the current scope.
Phase 1 should include:
- Storing local profiles
- Managed uploaded profiles (upload, revoke)
fitzgen had a link from an older bug that may be worth reading up on:
Note that go's builtin profiling tools support this and it seems pretty neat: https://github.com/bradfitz/talk-yapc-asia-2015/blob/master/talk.md
Florian did an experimental script to help comparing two profiles, I tweaked it to accomodate my usecase and I think that's a very good starting point that help prototype how we could compare stacks and frames!
Here is my fork, it is run like this:
$OBJDIR/dist/bin/run-mozilla.sh $OBJDIR/dist/bin/xpcshell compare-profile-alex.js p1.profile p2.profile
You can omit run-mozilla.sh thing if you are not on linux. The two profiles as arguments have to be Talos profile fetched directly from Talos, like this talos zip file.
This script does three things:
- look for a marker (its name is hardcoded in the js file, you will want to modify it if you want to test it!), and only care about the data within marker timeline,
- aggregate two profiles in one, thus, helping comparing two from the same interface and so reuse the same filter, that is handy!
- compute two new fake thread, one being all the stacks that are in the first profile and not in the second and another thread with the other way around. You end up with a profile like this one
All the credits go to florian, his original script supports profile fetched from perf-html server rather than the one from talos and accept a folder as argument. It will filter the slowest and the fastest profile and compare them. It also contains hardcoded markers strings in it you will want to modify.
This is amazing! What is the difference between the pseudo-threads "First profile" and "Only first"? Are stacks for the two profiles being kept separate in the respective pseudo-threads?
TBH, I've not reviewed what is being done exactly, but here is Florian's words about that:
It creates a 3rd and 4th profile showing samples that only appears in the first profile ou the second.
First profile is the GeckoMain thread of the first profile passed as command line argument.
Second profile same, but second.
Only first is showing only the samples that are in the first but not in the second.
Only second is showing only the samples that are in the second but not in the first.
That would be great to have a quick feedback from someone with strong knowledges about profile data structure to validate what is being done here. Also, ideas on what we could do next would be welcomed. I imagine Me and Florian would be happy to experiment more ideas from this script, but anyone is welcome to hack on it as well.
@ochameau great, more people experimenting with Florian's script was what I was hoping for as a lot of the processing be prototyped with some script processing.
I tweaked it to accomodate my usecase and I think that's a very good starting point that help prototype how we could compare stacks and frames!
Florian did not have success with his past tests to answer his question as the result was too noisy and had too little overlap. Where you able to find your answer in the generated profile?
Where you able to find your answer in the generated profile?
I used that on profiles I already analysed manually, but it was great to see that it immediately put in front of me what I was suspecting to be different. When comparing manually it is always hard to know if the thing you think is different isn't hidden somewhere else in the calltree or flame chart, especially when some frames or calls are cut in many pieces. With this simple diff, you are much more confident ! And I'm especially confident as it seems to align to manual analysis. Confirming that both me and the script are most likely correct ;)
Unfortunately I didn't have the time to look at Florian's script yet. But I'd like to brain dump my mind before my PTOs.
So here is what I had in mind:
- an UI to select the profiles to compare (could be accessible from /diff). This could be as easy as 2 inputs. The user would paste in these inputs the full URL to uploaded profiles. From these URLs we already have the code to:
- download profiles
- process profiles according to transforms and various options
- select a specific thread
- select a specific range
- we could even save these 2 URLs in the URL itself, as parameters to the /diff endpoint.
- We would keep only the samples resulting from this process. This makes it possible for a user to pinpoint the problematic area and make the diffing algorithm actually bring useful data.
- as for the diffing algorithm itself, what I had in mind was exactly what was in #1157: because the call tree is basically only additions, we could add the samples from profile 1, but negate the samples from profile 2. I don't know if that would bring something, but I'm hopeful. I definitely want to look at Florian's algorithm too, and see how this can relate.
From @bgrins, this is a good use case: https://bugzilla.mozilla.org/show_bug.cgi?id=1505944#c0
From @ochameau, this is a diff he got from a modified Florian's script: https://perfht.ml/2UAR9X2
This is the base profile: https://perfht.ml/2UFTZtN
and this is the profile for the regression about a function called observeActivity: https://perfht.ml/2UCl8xI
(possibly not the exact same ones as the ones used for the diff)
#1575 will land soon and implements a first step: from 2 profile URLs, import 2 threads from 2 different profiles into one view. Once in this view everything happens as if the 2 threads were from the same profile, there's no diff happening (yet).
This works by going to https://perf-html.io/compare and inputting the 2 URLs in the 2 input fields. This UX isn't perfect obviously.
Here are possible next steps I can see:
- [x] diff functionality for the call tree (quite some work) #1883
- [ ] have a split view in the Details panel so that the trees and the charts can be more easily compared visually (a lot of work)
- [ ] import more than 2 profiles into the view (easy)
- [ ] import more than just the selected thread into the view (would need another approach)
- [ ] have the form linked from somewhere, or integrate it better with the rest of the UI (easy, we need an agreement)
- [x] support different intervals (somewhat easy)
- [ ] support screenshots (quite easy)
- [ ] some documentation
- [ ] support importing more than 1 thread per profile (eg: Andrew asked me about optionally importing the parent process).
- [ ] being able to change the parameters after loading (a bit of work but should require mostly UI work and moving code around)
@julienw since this landed, should we close the issue?
I don't think so, cf the 2 latest comments above :-) This is like a meta bug for more stuff around this topic. Until we plan to spend time on this I don't think this is worth filing all the bugs.. but maybe we could file the ones that contributors could pick. I'll look at it!