rustc-perf Improve the experience for PR authors and reviewers looking at performance changes

Improve the experience for PR authors and reviewers looking at performance changes

Open rylev opened this issue 3 years ago • 0 comments

The following is a user story outlining (my opinion) on an ideal medium-term experience for what PR authors and reviewers should expect to go through when it comes to assessing the performance impact of the PR in question. This story is explicitly meant to target something that feels achievable in the next few months. There is certainly much more we could do in the future to make the process even better, but I'll leave that to a different issue.

Additionally the details of exact wording of potential messaging is not important. What is important is the flow of the process as well as the data and instructions that are given to help PR authors and reviewers make decisions.

The story

Alice opens a PR against rust-lang/rust and Bob is assigned to review the PR. Because the PR touches certain files, a comment is left @ mentioning Alice and Bob to consider doing a performance run. Alice kicks off the performance run, and the perf bot leaves a message that the performance run has begun. A little while later, the run is completed and an overview is presented:

Message from the performance bot

The performance run has completed for commit 092adf93020.

Summary:

Medium sized regression (largest delta 4.5%/8.2x significance threshold) in instruction count
Small sized regression (largest delta 1.2%/1.3x significance threshold) in cycles

Recommended actions:

Investigate instruction count regression (how-to guide)
Cycle regression is small enough that it can likely be ignored

The performance delta for this change is large enough that merging the PR is blocked until it is either fixed or justified. You may run new performance tests with @bors try @rust-timer queue or unblock the PR by providing a justification for the performance delta with @bors justify-performance $REASON where $REASON is a description of your justification.

For more details and to start the investigation visit here.

The investigation

Alice clicks on the link to start her investigation into instruction count regressions. Immediately she notices that the page has again identified instruction counts as an issue by highlighting the instruction summary in read. All other metrics are displayed but not highlighted since they do not represent regressions.

The instruction count summary highlights that 7 of the 8 test cases that have regressed are incremental compliation test cases. The PR in question shouldn't impact incremental compilation, so this is interesting. The summary also indicates that all 6 of the 8 regressions are in real-world crate benchmarks and not stress test benchmarks which gives her extra motivation to get to the bottom of the issue.

Below the summary area are boxes highlighting the test cases which have regressed. Clicking on the largest regression, Alice is taken to a new page where more information on that test case can be seen. She sees results for that test case over time, and it's clear that this is an outlier compared to others. She looks at the self profiling information and notices that indeed the query showing the largest delta is incr_comp_serialize_result_cache.

Confused why this might be, she finds the answer for what she should next in a helpful section new the top of the page titled "Next Steps". In the section are command line instructions for running various performance analysis tools like cachegrind as well as an information tooltip explaining what cachegrind is useful for.

She decides to run the command locally against the regressed benchmark. The wrapper tool provided downloads the compiler artifact for her and runs cachgrind on her behalf. Using the instructions linked to in the toolbox, she interprets the results to find the function in question which is taking such a long time. She takes a look at that function in the rustc source and sees something that might be the cause.

She doesn't have time to finish now, but posts the results of her investigation to the PR.

Delta

The following are items that don't currently exist but would need to in order to make the above a reality:

[ ] Prompt for running perf tests when certain files are touched
[ ] Better summary output in PRs of how severe regressions are and concrete recommendations of what to investigate
[ ] Performance bot looking at more than just instruction count when posting about regressions
[ ] Performance bot that blocks bors merging the PR if the regression is severe enough
[ ] How-to guides for showing best practices when investigating performance issues
[ ] Comparison page that shows more than just one metric at a time
[x] Summary sections that highlight the nature of performance deltas
- [x] Summary of if a particular kind of scenario or profile is effected
- [x] Summary of if this impacts real world crates vs stress tests
[ ] Detail page with clearly marked historical data
[ ] "Next steps" section highlighting tools for running certain things like cachegrind locally

Oct 08 '21 14:10 rylev

rustc-perf rustc-perf copied to clipboard

Improve the experience for PR authors and reviewers looking at performance changes

The story

Message from the performance bot

The investigation

Delta

rustc-perf
rustc-perf copied to clipboard