roachprod-microbench: post GitHub issues for performance regressions
Previously, performance regressions detected during the weekly microbenchmark comparison were only reported via Slack notifications. This made it difficult to track and ensure timely follow-up on regressions, as they were often discussed informally without formal issue tracking.
This change extends the existing --post-issues flag to work with the compare command. When enabled, the system automatically creates GitHub issues for performance regressions that exceed 20% (the "red" regression threshold). Each issue includes:
- Package name and list of regressed benchmarks
- Regression percentages and formatted deltas
- Link to the Google Sheet with detailed comparison data
- Labels: O-microbench and C-performance for easy filtering
The implementation reuses the same GitHub posting infrastructure and environment variables (GITHUB_BRANCH, GITHUB_SHA, GITHUB_BINARY) as the existing benchmark failure reporting. Issues are created per package to avoid spam, with up to 10 regressions listed in each issue summary.
Example GitHub Issue screenshot:
Epic: None Release note: None
@rishabh7m
How was this tested? Can you paste a screenshot or the link of any generated issue?
No, it was not, let me test and update this PR.
Does this change needs to be backported?
No I don't think so.
How was this tested? Can you paste a screenshot or the link of any generated issue?
I've added the unit test that verifies the format of issue posted during regression.
How was this tested? Can you paste a screenshot or the link of any generated issue?
I've added the unit test that verifies the format of issue posted during regression.
The unit test is great, but it would still be nice to test it end-to-end. You can create a dummy issue to serve as an example.
When enabled, the system automatically creates GitHub issues for performance regressions that exceed 20% (the "red" regression threshold). Each issue includes:
Package name and list of regressed benchmarks
In case of a misconfiguration (or other bug), what if every package results in a regression? We should limit the total (possible) number of created GH issues.
@srosenberg
How was this tested? Can you paste a screenshot or the link of any generated issue?
I've added the unit test that verifies the format of issue posted during regression.
The unit test is great, but it would still be nice to test it end-to-end. You can create a dummy issue to serve as an example.
Fair point, I will create a dummy issue and update the description.
In case of a misconfiguration (or other bug), what if every package results in a regression? We should limit the total (possible) number of created GH issues.
~I will limit it to 5 issues.~
This changes creates one GitHub issue per package with all severe regressions (skips creating issue, incase it there's none in the pkg). Currently there are 23 packages and from the historical slack messages in #perf-ops, I think on an average we get ~8 pkg with atleast one regression, it would be safe to put 10 issues as limit on stop creating. LMK your thoughts.
Also, would like to understand how exactly do you want to limit. Stop creating issues after 10 issues or donot even create one if there's more than 10.
@nameisbhaskar I've addressed the review comments.
Potential Bug(s) Detected
The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.
Next Steps: Please review the detailed findings in the workflow run.
Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.
After you review the findings, please tag the issue as follows:
- If the detected issue is real or was helpful in any way, please tag the issue with
O-AI-Review-Real-Issue-Found - If the detected issue was not helpful in any way, please tag the issue with
O-AI-Review-Not-Helpful
This changes creates one GitHub issue per package with all severe regressions (skips creating issue, incase it there's none in the pkg). Currently there are 23 packages and from the historical slack messages in #perf-ops, I think on an average we get ~8 pkg with atleast one regression, it would be safe to put 10 issues as limit on stop creating. LMK your thoughts.
Also, would like to understand how exactly do you want to limit. Stop creating issues after 10 issues or donot even create one if there's more than 10.
Indeed, the reporter uses only top-level packages (see readMetrics). This does limit the "blast radius" wrt bounding the number of newly created GH issues. In light of that, I don't think we need any other limiter; i.e., worst-case scenario each top-level package results in a newly created issue.
However, we should ensure each issue has an owner. In createRegressionPostRequest, you're reusing regressions[0].benchmarkName with githubpost.DefaultFormatter which may end up not finding an owner since not every GetTestOwner(packageName, testName) has an owner. (The catch-all test-eng is a historical artifact which ideally wouldn't exist.) In other words, you should do an exhaustive search for each regression under the top-level package to ensure that we do find an owner for those benches.
However, we should ensure each issue has an owner. In
createRegressionPostRequest, you're reusingregressions[0].benchmarkNamewithgithubpost.DefaultFormatterwhich may end up not finding an owner since not everyGetTestOwner(packageName, testName)has an owner. (The catch-alltest-engis a historical artifact which ideally wouldn't exist.) In other words, you should do an exhaustive search for each regression under the top-level package to ensure that we do find an owner for those benches.
On second thought, this isn't strictly needed because the CODEOWNERS linter ensures every package path has an owner.
⚪ Sysbench [SQL, 3node, oltp_read_write]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| ⚪ sec/op | 11.22m ±1% | 11.24m ±1% | ~ | p=0.967 n=15 |
| ⚪ allocs/op | 8.158k ±2% | 8.090k ±1% | ~ | p=0.519 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/e6e9dda/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/e6e9ddaaeb2fe0a0c4689eea10677e36400de199/bin/pkg_sql_tests benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/557311a/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/557311a3c84a358a81e7fb009372e7d286898415/bin/pkg_sql_tests benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=557311a --new=e6e9dda ./pkg/sql/tests
🔴 Sysbench [KV, 3node, oltp_read_only]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| 🔴 sec/op | 3.379m ±1% | 3.410m ±1% | +0.93% | p=0.001 n=15 |
| ⚪ allocs/op | 2.081k ±0% | 2.081k ±0% | ~ | p=0.677 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/e6e9dda/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/e6e9ddaaeb2fe0a0c4689eea10677e36400de199/bin/pkg_sql_tests benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/557311a/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/557311a3c84a358a81e7fb009372e7d286898415/bin/pkg_sql_tests benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=557311a --new=e6e9dda ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_write_only]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| ⚪ sec/op | 3.668m ±1% | 3.696m ±1% | +0.76% | p=0.000 n=15 |
| ⚪ allocs/op | 4.183k ±0% | 4.180k ±0% | ~ | p=0.066 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/e6e9dda/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/e6e9ddaaeb2fe0a0c4689eea10677e36400de199/bin/pkg_sql_tests benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/557311a/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/557311a3c84a358a81e7fb009372e7d286898415/bin/pkg_sql_tests benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=557311a --new=e6e9dda ./pkg/sql/tests
Artifacts
download:
mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/e6e9ddaaeb2fe0a0c4689eea10677e36400de199/20294479770-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/557311a3c84a358a81e7fb009372e7d286898415/20294479770-1/\* old/
built with commit: e6e9ddaaeb2fe0a0c4689eea10677e36400de199
Potential Bug(s) Detected
The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.
Next Steps: Please review the detailed findings in the workflow run.
Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.
After you review the findings, please tag the issue as follows:
- If the detected issue is real or was helpful in any way, please tag the issue with
O-AI-Review-Real-Issue-Found - If the detected issue was not helpful in any way, please tag the issue with
O-AI-Review-Not-Helpful
⚪ Sysbench [SQL, 3node, oltp_read_write]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| ⚪ sec/op | 11.97m ±0% | 11.99m ±0% | ~ | p=0.074 n=15 |
| ⚪ allocs/op | 8.159k ±0% | 8.161k ±0% | ~ | p=0.943 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/0b4f4b8/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/0b4f4b844075cd77cecf4166abe4312482a0fe17/bin/pkg_sql_tests benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=9918045 --new=0b4f4b8 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_read_only]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| ⚪ sec/op | 3.453m ±1% | 3.449m ±1% | ~ | p=0.595 n=15 |
| ⚪ allocs/op | 2.082k ±0% | 2.081k ±0% | ~ | p=0.276 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/0b4f4b8/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/0b4f4b844075cd77cecf4166abe4312482a0fe17/bin/pkg_sql_tests benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=9918045 --new=0b4f4b8 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_write_only]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| ⚪ sec/op | 3.750m ±1% | 3.756m ±1% | ~ | p=0.713 n=15 |
| ⚪ allocs/op | 4.178k ±0% | 4.178k ±0% | ~ | p=0.943 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/0b4f4b8/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/0b4f4b844075cd77cecf4166abe4312482a0fe17/bin/pkg_sql_tests benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=9918045 --new=0b4f4b8 ./pkg/sql/tests
Artifacts
download:
mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/0b4f4b844075cd77cecf4166abe4312482a0fe17/20359493162-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/99180451dea5c745ea9e5c86e60aeb125491be4d/20359493162-1/\* old/
built with commit: 0b4f4b844075cd77cecf4166abe4312482a0fe17
⚪ Sysbench [SQL, 3node, oltp_read_write]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| ⚪ sec/op | 11.37m ±1% | 11.43m ±1% | +0.51% | p=0.000 n=15 |
| ⚪ allocs/op | 8.158k ±1% | 8.178k ±0% | ~ | p=0.046 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/1e427a6/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/1e427a68a483c306cd8afa315a529ac31e024c9f/bin/pkg_sql_tests benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=9918045 --new=1e427a6 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_read_only]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| ⚪ sec/op | 3.359m ±1% | 3.353m ±1% | ~ | p=0.367 n=15 |
| ⚪ allocs/op | 2.081k ±0% | 2.081k ±0% | ~ | p=0.875 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/1e427a6/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/1e427a68a483c306cd8afa315a529ac31e024c9f/bin/pkg_sql_tests benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=9918045 --new=1e427a6 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_write_only]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| ⚪ sec/op | 3.621m ±0% | 3.637m ±1% | ~ | p=0.126 n=15 |
| ⚪ allocs/op | 4.185k ±0% | 4.185k ±0% | ~ | p=0.942 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/1e427a6/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/1e427a68a483c306cd8afa315a529ac31e024c9f/bin/pkg_sql_tests benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=9918045 --new=1e427a6 ./pkg/sql/tests
Artifacts
download:
mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/1e427a68a483c306cd8afa315a529ac31e024c9f/20363420866-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/99180451dea5c745ea9e5c86e60aeb125491be4d/20363420866-1/\* old/
built with commit: 1e427a68a483c306cd8afa315a529ac31e024c9f
⚪ Sysbench [SQL, 3node, oltp_read_write]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| ⚪ sec/op | 11.29m ±1% | 11.32m ±1% | ~ | p=0.267 n=15 |
| ⚪ allocs/op | 8.162k ±1% | 8.159k ±1% | ~ | p=0.911 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/b500279/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/b500279d1c4ef18ab82963a57fe1847a789969f4/bin/pkg_sql_tests benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/43add13/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/43add13db8696097b67ab8de556bf0a69431dea6/bin/pkg_sql_tests benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=43add13 --new=b500279 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_read_only]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| ⚪ sec/op | 3.251m ±0% | 3.277m ±0% | +0.79% | p=0.000 n=15 |
| ⚪ allocs/op | 2.082k ±0% | 2.082k ±0% | ~ | p=0.385 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/b500279/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/b500279d1c4ef18ab82963a57fe1847a789969f4/bin/pkg_sql_tests benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/43add13/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/43add13db8696097b67ab8de556bf0a69431dea6/bin/pkg_sql_tests benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=43add13 --new=b500279 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_write_only]
| Metric | Old Commit | New Commit | Delta | Note |
|---|---|---|---|---|
| ⚪ sec/op | 3.628m ±2% | 3.617m ±2% | ~ | p=0.967 n=15 |
| ⚪ allocs/op | 4.187k ±0% | 4.190k ±0% | ~ | p=0.630 n=15 |
Reproduce
benchdiff binaries:
mkdir -p benchdiff/b500279/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/b500279d1c4ef18ab82963a57fe1847a789969f4/bin/pkg_sql_tests benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/43add13/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/43add13db8696097b67ab8de556bf0a69431dea6/bin/pkg_sql_tests benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
benchdiff command:
benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=43add13 --new=b500279 ./pkg/sql/tests
Artifacts
download:
mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/b500279d1c4ef18ab82963a57fe1847a789969f4/20366155031-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/43add13db8696097b67ab8de556bf0a69431dea6/20366155031-1/\* old/
built with commit: b500279d1c4ef18ab82963a57fe1847a789969f4