pystac
pystac copied to clipboard
Add performance benchmarks [RFC]
Related Issue(s):
- Closes #729
Description:
Uses the airspeed velocity (asv
) library to run performance benchmarks. This configuration generally follows the strategy used by xarray
(as documented here), with some notable exceptions:
- Increase the number of rounds to 4 (from default of 2) and the number of repetitions to between 10 & 50 (from default of 5). This seemed to eliminate a lot of the false positives with regard to performance changes. These values are set in the
Bench
base class and can be changed for a particular benchmark. - Use the
--interleave-rounds
option (also seems to help with eliminating false positives) - Run benchmarks on all supported Python versions (we may want to change this later as the benchmark suite grows and takes longer to run).
This is a work-in-progress, but I want to get feedback on the approach before fleshing out the rest.
Still to do:
- [ ] Add timing benchmarks for extension implementations (maybe not all, but at least a few)
- [ ] Add timing and peak memory benchmarks for reading and walking a catalog
- [ ] Add timing benchmark for writing large catalogs to disk (e.g. on the order of thousands of items)
- [ ] Come up with a strategy for measuring possible performance improvements from implementing async requests. We will probably see the most impact from this when making network requests rather than local file reads, but we don't want our benchmarks dependent upon changing network latency.
- [ ] Add documentation
PR Checklist:
- [ ] Code is formatted (run
pre-commit run --all-files
) - [ ] Tests pass (run
scripts/test
) - [ ] Documentation has been updated to reflect changes, if applicable
- [ ] This PR maintains or improves overall codebase code coverage.
- [ ] Changes are added to the CHANGELOG. See the docs for information about adding to the changelog.
@TomAugspurger Would be great to get your feedback as well.
Codecov Report
Base: 94.43% // Head: 94.43% // No change to project coverage :thumbsup:
Coverage data is based on head (
5b39551
) compared to base (c45dd20
). Patch has no changes to coverable lines.
Additional details and impacted files
@@ Coverage Diff @@
## main #748 +/- ##
=======================================
Coverage 94.43% 94.43%
=======================================
Files 83 83
Lines 12056 12056
Branches 1143 1143
=======================================
Hits 11385 11385
Misses 492 492
Partials 179 179
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
LGTM. Did you have a running list of other scenarios you'd like to benchmark? One or two that spring to mind are:
- Mutating the entire tree (e.g. setting the root href and updating all of the links/self hrefs)
- Summary/extent computation
Thanks @gadomski, I have a bit of a list started in #729. I like both of these suggestions, I'll add them to the PR.
I picked this PR up and, in the interest of not letting perfect be the enemy of good, implemented most of @duckontheweb's suggestions:
- I added benchmarks for reading, walking, and writing large catalogs, including peak memory usage for walking a large catalog
- I added a single extension benchmark, for the projection extension. I'm not quite sure what we want to be benchmarking there, so I kept it minimal.
- I refactored the location of things to make usage a bit simpler (e.g. I want
asv dev
to work out-of-the-box) - I removed CI benchmarking. To paraphrase @TomAugspurger from https://tomaugspurger.github.io/posts/performance-regressions, "This rules out running the benchmarks on ... CI ... even if we could finish it in time, we couldn’t really trust the results." It's my personal opinion that benchmarking on CI is so finicky and untrustworthy that its not worth the complexity. In liu of a dedicated benchmark machine, IMO it's better to run benchmarks locally when the situation calls for it. At least we have a framework for benchmarking now, which we could always expand in the future to a dedicated machine if its needed.
- Docs.
- Simple check in CI to make sure benchmarks run w/o failure -- doesn't do any reporting.
I didn't implement:
- Any sort of async strategy. I think that's best tackled as part of the async improvements themselves (https://github.com/stac-utils/pystac/pull/749) -- at least, I don't think they should hold up getting some sort of benchmarking into the repo.
- Extensive extension benchmarking. I'm not sure where the pain points are right now, and I don't want broken benchmarks to get in the way of solutions to #448.
cc @duckontheweb would love your review if you have the bandwidth, I can't request b/c it's your PR.
IMO this PR should be squash-merged since it's one coherent set of changes w/ some fixup commits -- I can't rebase b/c it's on @duckontheweb's fork.