outlines icon indicating copy to clipboard operation
outlines copied to clipboard

Introduce PR Benchmark Workflow

Open lapp0 opened this issue 1 year ago • 0 comments
trafficstars

Fixes #883

Changes

This change-set configures asv within the repo, along with the asv_benchmark_pr.yml workflow to comment benchmark comparisons in each open PR.

tests/benchmarks/ has been moved to benchmarks/ and converted from pytest-benchmark to asv format.

Behavior

  • Comparison is between PR HEAD and outlines-dev/outlines@main HEAD.
  • The use of --interleave-rounds -a repeat=3 in asv continuous mitigates variance due to environmental factors described in #883, but triples the runtime compared to a single pass.
  • "The median time from all samples collected in all roudns is used as the final measurement result."
  • Total benchmark workflow runtime (repeat=3): 23 minutes (should be close to test run time - 10 minutes)
  • Runs once per push within open PRs.
  • Creates a single comment per PR, and edits the comment when workflow is re-run within the same PR.

Examples

  • Times differ between 1% and 4% due to random variation: https://github.com/lapp0/outlines/pull/16#issuecomment-2118777376

  • Demo of Benchmark Output for PR with Performance Regression: https://github.com/lapp0/outlines/pull/18#issuecomment-2118860838

Out of Scope

With this infrastructure we can create useful historical performance dashboards such as https://asv-runner.github.io/asv-collection/pandas/ This requires a stable, dedicated machine which must have a guarantee of being idle during benchmark runs.

Repo Configuration Work

For this workflow we need to set up an access token for the repo with appropriate permissions:

  • contents: read
    • for retrieving compared revisions
  • pull-requests: read and write
    • for commenting

Then create a new asv-benchmarks environment, and a secret with key = GH_TOKEN, value = access token.

Security

I recommend the following setting so arbitrary workflows cannot be run in malicious PRs

https://github.com/outlines-dev/outlines/settings/actions

image

Text field

peter-evans/create-or-update-comment@*,
peter-evans/find-comment@*,
pre-commit/action@*,

TODO:

  • [x] asv configuration
  • [x] PR comment workflow
  • [x] migrate benchmarks from pytest-benchmark to asv
  • [x] harden workflow security (e.g. a PR with a new workflow using GH_TOKEN could spam the repo using the pull-requests write permissions)
  • [ ] use https://github.com/airspeed-velocity/asv/pull/1263/files
  • [ ] update docs
  • [ ] Optimize workflow run time (setup is majority of time, not benchmark execution)
  • [ ] receive commentary

@rlouf / @brandonwillard could you please share your thoughts on features / changes you'd like to see before this is ready for review?

lapp0 avatar May 18 '24 15:05 lapp0