cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

roachprod-microbench: post GitHub issues for performance regressions

Open ibreakthecloud opened this issue 1 month ago • 7 comments

Previously, performance regressions detected during the weekly microbenchmark comparison were only reported via Slack notifications. This made it difficult to track and ensure timely follow-up on regressions, as they were often discussed informally without formal issue tracking.

This change extends the existing --post-issues flag to work with the compare command. When enabled, the system automatically creates GitHub issues for performance regressions that exceed 20% (the "red" regression threshold). Each issue includes:

  • Package name and list of regressed benchmarks
  • Regression percentages and formatted deltas
  • Link to the Google Sheet with detailed comparison data
  • Labels: O-microbench and C-performance for easy filtering

The implementation reuses the same GitHub posting infrastructure and environment variables (GITHUB_BRANCH, GITHUB_SHA, GITHUB_BINARY) as the existing benchmark failure reporting. Issues are created per package to avoid spam, with up to 10 regressions listed in each issue summary.

Example GitHub Issue screenshot: image

Epic: None Release note: None

ibreakthecloud avatar Nov 07 '25 08:11 ibreakthecloud

This change is Reviewable

cockroach-teamcity avatar Nov 07 '25 08:11 cockroach-teamcity

@rishabh7m

How was this tested? Can you paste a screenshot or the link of any generated issue?

No, it was not, let me test and update this PR.

Does this change needs to be backported?

No I don't think so.

ibreakthecloud avatar Nov 11 '25 07:11 ibreakthecloud

How was this tested? Can you paste a screenshot or the link of any generated issue?

I've added the unit test that verifies the format of issue posted during regression.

ibreakthecloud avatar Nov 11 '25 10:11 ibreakthecloud

How was this tested? Can you paste a screenshot or the link of any generated issue?

I've added the unit test that verifies the format of issue posted during regression.

The unit test is great, but it would still be nice to test it end-to-end. You can create a dummy issue to serve as an example.

When enabled, the system automatically creates GitHub issues for performance regressions that exceed 20% (the "red" regression threshold). Each issue includes:

Package name and list of regressed benchmarks

In case of a misconfiguration (or other bug), what if every package results in a regression? We should limit the total (possible) number of created GH issues.

srosenberg avatar Nov 13 '25 03:11 srosenberg

@srosenberg

How was this tested? Can you paste a screenshot or the link of any generated issue?

I've added the unit test that verifies the format of issue posted during regression.

The unit test is great, but it would still be nice to test it end-to-end. You can create a dummy issue to serve as an example.

Fair point, I will create a dummy issue and update the description.

In case of a misconfiguration (or other bug), what if every package results in a regression? We should limit the total (possible) number of created GH issues.

~I will limit it to 5 issues.~

This changes creates one GitHub issue per package with all severe regressions (skips creating issue, incase it there's none in the pkg). Currently there are 23 packages and from the historical slack messages in #perf-ops, I think on an average we get ~8 pkg with atleast one regression, it would be safe to put 10 issues as limit on stop creating. LMK your thoughts.

Also, would like to understand how exactly do you want to limit. Stop creating issues after 10 issues or donot even create one if there's more than 10.

ibreakthecloud avatar Nov 13 '25 07:11 ibreakthecloud

@nameisbhaskar I've addressed the review comments.

ibreakthecloud avatar Dec 10 '25 16:12 ibreakthecloud

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps: Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

github-actions[bot] avatar Dec 10 '25 16:12 github-actions[bot]

This changes creates one GitHub issue per package with all severe regressions (skips creating issue, incase it there's none in the pkg). Currently there are 23 packages and from the historical slack messages in #perf-ops, I think on an average we get ~8 pkg with atleast one regression, it would be safe to put 10 issues as limit on stop creating. LMK your thoughts.

Also, would like to understand how exactly do you want to limit. Stop creating issues after 10 issues or donot even create one if there's more than 10.

Indeed, the reporter uses only top-level packages (see readMetrics). This does limit the "blast radius" wrt bounding the number of newly created GH issues. In light of that, I don't think we need any other limiter; i.e., worst-case scenario each top-level package results in a newly created issue.

However, we should ensure each issue has an owner. In createRegressionPostRequest, you're reusing regressions[0].benchmarkName with githubpost.DefaultFormatter which may end up not finding an owner since not every GetTestOwner(packageName, testName) has an owner. (The catch-all test-eng is a historical artifact which ideally wouldn't exist.) In other words, you should do an exhaustive search for each regression under the top-level package to ensure that we do find an owner for those benches.

srosenberg avatar Dec 14 '25 00:12 srosenberg

However, we should ensure each issue has an owner. In createRegressionPostRequest, you're reusing regressions[0].benchmarkName with githubpost.DefaultFormatter which may end up not finding an owner since not every GetTestOwner(packageName, testName) has an owner. (The catch-all test-eng is a historical artifact which ideally wouldn't exist.) In other words, you should do an exhaustive search for each regression under the top-level package to ensure that we do find an owner for those benches.

On second thought, this isn't strictly needed because the CODEOWNERS linter ensures every package path has an owner.

srosenberg avatar Dec 14 '25 05:12 srosenberg

⚪ Sysbench [SQL, 3node, oltp_read_write]
Metric Old Commit New Commit Delta Note
sec/op 11.22m ±1% 11.24m ±1% ~ p=0.967 n=15
allocs/op 8.158k ±2% 8.090k ±1% ~ p=0.519 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/e6e9dda/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/e6e9ddaaeb2fe0a0c4689eea10677e36400de199/bin/pkg_sql_tests benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/557311a/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/557311a3c84a358a81e7fb009372e7d286898415/bin/pkg_sql_tests benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=557311a --new=e6e9dda ./pkg/sql/tests
🔴 Sysbench [KV, 3node, oltp_read_only]
Metric Old Commit New Commit Delta Note
🔴 sec/op 3.379m ±1% 3.410m ±1% +0.93% p=0.001 n=15
allocs/op 2.081k ±0% 2.081k ±0% ~ p=0.677 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/e6e9dda/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/e6e9ddaaeb2fe0a0c4689eea10677e36400de199/bin/pkg_sql_tests benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/557311a/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/557311a3c84a358a81e7fb009372e7d286898415/bin/pkg_sql_tests benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=557311a --new=e6e9dda ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_write_only]
Metric Old Commit New Commit Delta Note
sec/op 3.668m ±1% 3.696m ±1% +0.76% p=0.000 n=15
allocs/op 4.183k ±0% 4.180k ±0% ~ p=0.066 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/e6e9dda/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/e6e9ddaaeb2fe0a0c4689eea10677e36400de199/bin/pkg_sql_tests benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/e6e9dda/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/557311a/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/557311a3c84a358a81e7fb009372e7d286898415/bin/pkg_sql_tests benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/557311a/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=557311a --new=e6e9dda ./pkg/sql/tests
Artifacts

download:

mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/e6e9ddaaeb2fe0a0c4689eea10677e36400de199/20294479770-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/557311a3c84a358a81e7fb009372e7d286898415/20294479770-1/\* old/

built with commit: e6e9ddaaeb2fe0a0c4689eea10677e36400de199

cockroach-teamcity avatar Dec 17 '25 07:12 cockroach-teamcity

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps: Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

github-actions[bot] avatar Dec 19 '25 04:12 github-actions[bot]

⚪ Sysbench [SQL, 3node, oltp_read_write]
Metric Old Commit New Commit Delta Note
sec/op 11.97m ±0% 11.99m ±0% ~ p=0.074 n=15
allocs/op 8.159k ±0% 8.161k ±0% ~ p=0.943 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/0b4f4b8/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/0b4f4b844075cd77cecf4166abe4312482a0fe17/bin/pkg_sql_tests benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=9918045 --new=0b4f4b8 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_read_only]
Metric Old Commit New Commit Delta Note
sec/op 3.453m ±1% 3.449m ±1% ~ p=0.595 n=15
allocs/op 2.082k ±0% 2.081k ±0% ~ p=0.276 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/0b4f4b8/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/0b4f4b844075cd77cecf4166abe4312482a0fe17/bin/pkg_sql_tests benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=9918045 --new=0b4f4b8 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_write_only]
Metric Old Commit New Commit Delta Note
sec/op 3.750m ±1% 3.756m ±1% ~ p=0.713 n=15
allocs/op 4.178k ±0% 4.178k ±0% ~ p=0.943 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/0b4f4b8/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/0b4f4b844075cd77cecf4166abe4312482a0fe17/bin/pkg_sql_tests benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/0b4f4b8/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=9918045 --new=0b4f4b8 ./pkg/sql/tests
Artifacts

download:

mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/0b4f4b844075cd77cecf4166abe4312482a0fe17/20359493162-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/99180451dea5c745ea9e5c86e60aeb125491be4d/20359493162-1/\* old/

built with commit: 0b4f4b844075cd77cecf4166abe4312482a0fe17

cockroach-teamcity avatar Dec 19 '25 04:12 cockroach-teamcity

⚪ Sysbench [SQL, 3node, oltp_read_write]
Metric Old Commit New Commit Delta Note
sec/op 11.37m ±1% 11.43m ±1% +0.51% p=0.000 n=15
allocs/op 8.158k ±1% 8.178k ±0% ~ p=0.046 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/1e427a6/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/1e427a68a483c306cd8afa315a529ac31e024c9f/bin/pkg_sql_tests benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=9918045 --new=1e427a6 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_read_only]
Metric Old Commit New Commit Delta Note
sec/op 3.359m ±1% 3.353m ±1% ~ p=0.367 n=15
allocs/op 2.081k ±0% 2.081k ±0% ~ p=0.875 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/1e427a6/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/1e427a68a483c306cd8afa315a529ac31e024c9f/bin/pkg_sql_tests benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=9918045 --new=1e427a6 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_write_only]
Metric Old Commit New Commit Delta Note
sec/op 3.621m ±0% 3.637m ±1% ~ p=0.126 n=15
allocs/op 4.185k ±0% 4.185k ±0% ~ p=0.942 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/1e427a6/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/1e427a68a483c306cd8afa315a529ac31e024c9f/bin/pkg_sql_tests benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/1e427a6/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/9918045/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/99180451dea5c745ea9e5c86e60aeb125491be4d/bin/pkg_sql_tests benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/9918045/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=9918045 --new=1e427a6 ./pkg/sql/tests
Artifacts

download:

mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/1e427a68a483c306cd8afa315a529ac31e024c9f/20363420866-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/99180451dea5c745ea9e5c86e60aeb125491be4d/20363420866-1/\* old/

built with commit: 1e427a68a483c306cd8afa315a529ac31e024c9f

cockroach-teamcity avatar Dec 19 '25 08:12 cockroach-teamcity

⚪ Sysbench [SQL, 3node, oltp_read_write]
Metric Old Commit New Commit Delta Note
sec/op 11.29m ±1% 11.32m ±1% ~ p=0.267 n=15
allocs/op 8.162k ±1% 8.159k ±1% ~ p=0.911 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/b500279/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/b500279d1c4ef18ab82963a57fe1847a789969f4/bin/pkg_sql_tests benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/43add13/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/43add13db8696097b67ab8de556bf0a69431dea6/bin/pkg_sql_tests benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=43add13 --new=b500279 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_read_only]
Metric Old Commit New Commit Delta Note
sec/op 3.251m ±0% 3.277m ±0% +0.79% p=0.000 n=15
allocs/op 2.082k ±0% 2.082k ±0% ~ p=0.385 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/b500279/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/b500279d1c4ef18ab82963a57fe1847a789969f4/bin/pkg_sql_tests benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/43add13/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/43add13db8696097b67ab8de556bf0a69431dea6/bin/pkg_sql_tests benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=43add13 --new=b500279 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_write_only]
Metric Old Commit New Commit Delta Note
sec/op 3.628m ±2% 3.617m ±2% ~ p=0.967 n=15
allocs/op 4.187k ±0% 4.190k ±0% ~ p=0.630 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/b500279/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/b500279d1c4ef18ab82963a57fe1847a789969f4/bin/pkg_sql_tests benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/b500279/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/43add13/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/43add13db8696097b67ab8de556bf0a69431dea6/bin/pkg_sql_tests benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/43add13/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=43add13 --new=b500279 ./pkg/sql/tests
Artifacts

download:

mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/b500279d1c4ef18ab82963a57fe1847a789969f4/20366155031-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/43add13db8696097b67ab8de556bf0a69431dea6/20366155031-1/\* old/

built with commit: b500279d1c4ef18ab82963a57fe1847a789969f4

cockroach-teamcity avatar Dec 19 '25 10:12 cockroach-teamcity