apm-server icon indicating copy to clipboard operation
apm-server copied to clipboard

Update the existing benchmarks workflow to copy, upload and inject PGO profile.

Open 1pkg opened this issue 1 year ago • 1 comments

Motivation/summary

This PR implements changes outlined in https://github.com/elastic/apm-server/issues/13859. By updating the existing benchmarks workflow to copy, upload and inject CPU profiles for PGO enabled builds.

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

Related issues

1pkg avatar Aug 15 '24 00:08 1pkg

This pull request does not have a backport label. Could you fix it @1pkg? 🙏 To fixup this pull request, you need to add the backport labels for the needed branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.

NOTE: backport-skip has been added to this pull request.

mergify[bot] avatar Aug 15 '24 00:08 mergify[bot]

the only doubt I have is on the CI part, not sure how to test it

inge4pres avatar Sep 20 '24 15:09 inge4pres

This pull request is now in conflicts. Could you fix it @1pkg? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b inject-build-pgo-profile upstream/inject-build-pgo-profile
git merge upstream/main
git push upstream inject-build-pgo-profile

mergify[bot] avatar Sep 23 '24 18:09 mergify[bot]

the only doubt I have is on the CI part, not sure how to test it

Github Actions are problematic to test. AFAIK, the simplest way to test them is to just run them in Github directly, which is really manual and less than ideal. I know about few projects that can help reduce the "pain":

  • https://github.com/nektos/act that allows running Github Actions locally; it falls short for a complex actions like https://github.com/elastic/apm-server/actions/workflows/benchmarks.yml.
  • https://github.com/sethvargo/go-githubactions that allows creating Github Actions in go; I never used it in the pass, but it looks promising. The main problem with using go-githubactions, is that it's sufficiently different from all our existing Github Actions.

1pkg avatar Sep 23 '24 22:09 1pkg

pulling in the robots team since this is touching ci and some token permissions

kruskall avatar Sep 27 '24 22:09 kruskall

Had to make some final changes to address some regression in shared modules for smoke-tests. Now everything works as expected and aligned with the smoke tests results from the main branch.

See the workflow runs:

https://github.com/elastic/apm-server/actions/runs/11076905822 https://github.com/elastic/apm-server/actions/runs/11076854721 https://github.com/elastic/apm-server/actions/runs/11075827155

With this, this PR should be good to merge finally! Waiting for the final approvals + review from the robots team.

1pkg avatar Sep 28 '24 00:09 1pkg

Is this automation creating cloud resources but not deleting them? I see some leftovers in

image

that's the AWS account used for the CI - we separated the Cloud accounts from dev/ci, that's the reason you cannot access to that one

v1v avatar Sep 29 '24 11:09 v1v

This pull request is now in conflicts. Could you fix it @1pkg? 🙏 To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b inject-build-pgo-profile upstream/inject-build-pgo-profile
git merge upstream/main
git push upstream inject-build-pgo-profile

mergify[bot] avatar Sep 30 '24 23:09 mergify[bot]

Is this automation creating cloud resources but not deleting them? I see some leftovers in

image that's the AWS account used for the CI - we separated the Cloud accounts from dev/ci, that's the reason you cannot access to that one

@v1v initially there was an unrelated authentication problem in the benchmarks pipeline that caused the issue, see for example https://github.com/elastic/apm-server/actions/runs/10567550442/job/29276727544. Because I was experimenting a lot at the same time, I accidentally created multiple VPC resources that were not properly cleaned. This long got fixed since then. And now this pipeline works as expected.

Please take a closer look at the actual changes so we can merge them to the upstream, thank you.

1pkg avatar Oct 01 '24 17:10 1pkg

Final results after feedback from @v1v to properly set github access token.

The PGO standalone benchmark workflow run link -> resulted in the next PGO update PR link.

While the old benchmark against ES cloud works as expected without regression link.

1pkg avatar Oct 01 '24 23:10 1pkg