Related Issue(s):

Closes #729

Description:

Uses the airspeed velocity (asv) library to run performance benchmarks. This configuration generally follows the strategy used by xarray (as documented here), with some notable exceptions:

Increase the number of rounds to 4 (from default of 2) and the number of repetitions to between 10 & 50 (from default of 5). This seemed to eliminate a lot of the false positives with regard to performance changes. These values are set in the Bench base class and can be changed for a particular benchmark.
Use the --interleave-rounds option (also seems to help with eliminating false positives)
Run benchmarks on all supported Python versions (we may want to change this later as the benchmark suite grows and takes longer to run).

This is a work-in-progress, but I want to get feedback on the approach before fleshing out the rest.

Still to do:

[ ] Add timing benchmarks for extension implementations (maybe not all, but at least a few)
[ ] Add timing and peak memory benchmarks for reading and walking a catalog
[ ] Add timing benchmark for writing large catalogs to disk (e.g. on the order of thousands of items)
[ ] Come up with a strategy for measuring possible performance improvements from implementing async requests. We will probably see the most impact from this when making network requests rather than local file reads, but we don't want our benchmarks dependent upon changing network latency.
[ ] Add documentation

PR Checklist:

[ ] Code is formatted (run pre-commit run --all-files)
[ ] Tests pass (run scripts/test)
[ ] Documentation has been updated to reflect changes, if applicable
[ ] This PR maintains or improves overall codebase code coverage.
[ ] Changes are added to the CHANGELOG. See the docs for information about adding to the changelog.

Feb 10 '22 18:02 duckontheweb

@TomAugspurger Would be great to get your feedback as well.

Feb 10 '22 18:02 duckontheweb

Codecov Report

Base: 94.43% // Head: 94.43% // No change to project coverage :thumbsup:

Coverage data is based on head (5b39551) compared to base (c45dd20). Patch has no changes to coverable lines.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #748   +/-   ##
=======================================
  Coverage   94.43%   94.43%           
=======================================
  Files          83       83           
  Lines       12056    12056           
  Branches     1143     1143           
=======================================
  Hits        11385    11385           
  Misses        492      492           
  Partials      179      179

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

Feb 10 '22 18:02 codecov-commenter

LGTM. Did you have a running list of other scenarios you'd like to benchmark? One or two that spring to mind are:

Mutating the entire tree (e.g. setting the root href and updating all of the links/self hrefs)

Summary/extent computation

Thanks @gadomski, I have a bit of a list started in #729. I like both of these suggestions, I'll add them to the PR.

Feb 15 '22 13:02 duckontheweb

I picked this PR up and, in the interest of not letting perfect be the enemy of good, implemented most of @duckontheweb's suggestions:

I added benchmarks for reading, walking, and writing large catalogs, including peak memory usage for walking a large catalog
I added a single extension benchmark, for the projection extension. I'm not quite sure what we want to be benchmarking there, so I kept it minimal.
I refactored the location of things to make usage a bit simpler (e.g. I want asv dev to work out-of-the-box)
I removed CI benchmarking. To paraphrase @TomAugspurger from https://tomaugspurger.github.io/posts/performance-regressions, "This rules out running the benchmarks on ... CI ... even if we could finish it in time, we couldn’t really trust the results." It's my personal opinion that benchmarking on CI is so finicky and untrustworthy that its not worth the complexity. In liu of a dedicated benchmark machine, IMO it's better to run benchmarks locally when the situation calls for it. At least we have a framework for benchmarking now, which we could always expand in the future to a dedicated machine if its needed.
Docs.
Simple check in CI to make sure benchmarks run w/o failure -- doesn't do any reporting.

I didn't implement:

Any sort of async strategy. I think that's best tackled as part of the async improvements themselves (https://github.com/stac-utils/pystac/pull/749) -- at least, I don't think they should hold up getting some sort of benchmarking into the repo.
Extensive extension benchmarking. I'm not sure where the pain points are right now, and I don't want broken benchmarks to get in the way of solutions to #448.

cc @duckontheweb would love your review if you have the bandwidth, I can't request b/c it's your PR.

IMO this PR should be squash-merged since it's one coherent set of changes w/ some fixup commits -- I can't rebase b/c it's on @duckontheweb's fork.

Jan 09 '23 23:01 gadomski

pystac
pystac copied to clipboard

Add performance benchmarks [RFC]

Codecov Report

pystac pystac copied to clipboard

Add performance benchmarks [RFC]

Codecov Report

pystac
pystac copied to clipboard