outlines
outlines copied to clipboard
Introduce PR Benchmark Workflow
Fixes #883
Changes
This change-set configures asv within the repo, along with the asv_benchmark_pr.yml workflow to comment benchmark comparisons in each open PR.
tests/benchmarks/ has been moved to benchmarks/ and converted from pytest-benchmark to asv format.
Behavior
- Comparison is between PR
HEADandoutlines-dev/outlines@mainHEAD. - The use of
--interleave-rounds -a repeat=3inasv continuousmitigates variance due to environmental factors described in #883, but triples the runtime compared to a single pass. - "The median time from all samples collected in all roudns is used as the final measurement result."
- Total benchmark workflow runtime (
repeat=3): 23 minutes (should be close to test run time - 10 minutes) - Runs once per push within open PRs.
- Creates a single comment per PR, and edits the comment when workflow is re-run within the same PR.
Examples
-
Times differ between 1% and 4% due to random variation: https://github.com/lapp0/outlines/pull/16#issuecomment-2118777376
-
Demo of Benchmark Output for PR with Performance Regression: https://github.com/lapp0/outlines/pull/18#issuecomment-2118860838
Out of Scope
With this infrastructure we can create useful historical performance dashboards such as https://asv-runner.github.io/asv-collection/pandas/ This requires a stable, dedicated machine which must have a guarantee of being idle during benchmark runs.
Repo Configuration Work
For this workflow we need to set up an access token for the repo with appropriate permissions:
- contents: read
- for retrieving compared revisions
- pull-requests: read and write
- for commenting
Then create a new asv-benchmarks environment, and a secret with key = GH_TOKEN, value = access token.
Security
I recommend the following setting so arbitrary workflows cannot be run in malicious PRs
https://github.com/outlines-dev/outlines/settings/actions
Text field
peter-evans/create-or-update-comment@*,
peter-evans/find-comment@*,
pre-commit/action@*,
TODO:
- [x]
asvconfiguration - [x] PR comment workflow
- [x] migrate benchmarks from
pytest-benchmarktoasv - [x] harden workflow security (e.g. a PR with a new workflow using
GH_TOKENcould spam the repo using the pull-requests write permissions) - [ ] use https://github.com/airspeed-velocity/asv/pull/1263/files
- [ ] update docs
- [ ] Optimize workflow run time (setup is majority of time, not benchmark execution)
- [ ] receive commentary
@rlouf / @brandonwillard could you please share your thoughts on features / changes you'd like to see before this is ready for review?