[variance] capability to `assert`/`upload` multiple `collect`runs
Is your feature request related to a problem? Please describe. Yes, perf variance running lighthouse in the cloud due to ephemeral hardware variance (e.g. docker, VMs).
Describe the solution you'd like
The ability to upload(lhci upload) and assert (lhci assert) multiple collect (lhci collect) reports to the same build context. lhci already supports uploading with build context, so this option would just be extending that to allow accepting multiple reports.
Note this is different than using numberOfRuns config option. Having this new option would allow for running lhci collect multiple times in CI on a different container each time -- which, with enough test runs and the ability to take a mean/median of the runs, would help to eliminate hardware variance outliers.
Simplified example using circleci:
jobs:
lighthouse_run_1:
docker:
- image: patrickhulce/lhci-client
steps:
- run: lhci collect
- run: lhci upload
lighthouse_run_2:
docker:
- image: patrickhulce/lhci-client
steps:
- run: lhci collect
- run: lhci upload
lighthouse_assert:
docker:
- image: patrickhulce/lhci-client
steps:
# theoretical new addition to assert to allow for assertion against an average of the multiple reports in the same build
- run: lhci assert --reportsLocation=<location of upload location reports>
Describe alternatives you've considered
- Not running lighthouse in CI to avoid variance.
- A/B testing in lighthouse-ci i.e. running
lhciagainst 2 previous and current build on same run (same hardware).
Thanks for filing @khangsfdc!
You can accomplish this today by copying the contents of the .lighthouseci folder from each container into the final assert container.
i.e.
lighthouse_run_1:
- lhci collect
- mark-this-folder-to-ci-system-as-an-artifact ./.lighthouseci
lighthouse_run_2:
- lhci collect
- mark-this-folder-to-ci-system-as-an-artifact ./.lighthouseci
lighthouse_assert:
- open-artifacts-from-run-1 ./.lighthouseci
- open-artifacts-from-run-2 ./.lighthouseci
- lhci assert
Given these are natural "artifacts" of a CI step and most CI systems support artifact dependencies natively, I'm not sure we would prioritize uploading and refetching cross-container outside of CI.
@patrickhulce thanks for the tip! that sounds promising.
To clarify, the need to upload outside of CI is so that the aggregate of runs would show up as one in the Lighthouse CI Server dashboard.
If there is a way to do that using existing functionality please let me know.
I am also curious on your thoughts on this strategy to mitigate variance, as your doc here recommends avoiding containers altogether. Note I have also tried using machine executors in circleci, and while there appears to be less variance compared to docker, it is still more prevalent than I would like.
If there is a way to do that using existing functionality please let me know.
Yep, just add an lhci upload command after lhci assert in my example after you've merged the two .lighthouseci artifact folders.
I am also curious on your thoughts on this strategy to mitigate variance
Unfortunately the best advice I have unless you're willing to go to extreme lengths for dedicated infra is to avoid asserting any perf metrics/score and use the graphs as a guide to monitor instead. That doc recommends avoiding containers for good reason, and sadly that's pretty much the only convenient option in CI environments :(
If there is a way to do that using existing functionality please let me know.
Yep, just add an
lhci uploadcommand afterlhci assertin my example after you've merged the two.lighthouseciartifact folders.I am also curious on your thoughts on this strategy to mitigate variance
Unfortunately the best advice I have unless you're willing to go to extreme lengths for dedicated infra is to avoid asserting any perf metrics/score and use the graphs as a guide to monitor instead. That doc recommends avoiding containers for good reason, and sadly that's pretty much the only convenient option in CI environments :(
IC, that is indeed unfortunate as not having assertions means you would need to rely on multiple data points (i.e. multiple pull requests) to obtain a usable trend line.
If there is no solution to variance in containers, has there been any feature discussion on A/B testing -- not sure if that is the correct term, but I'm referring to running the benchmark on the current pull request vs target branch (e.g. master). That ensures you have an apples to apples comparison, and assertions can use a percentage difference instead of absolutes.
running the benchmark on the current pull request vs target branch (e.g. master). That ensures you have an apples to apples comparison
Unfortunately this still doesn't ensure that. It would enforce the same container is being used, but the container volatility is typically so high because they are sharing the same physical hardware as other completely unrelated tasks which vary across time. Given the marginal benefit and high effort to accomplish this, the short answer is no we're not exploring it.
assertions can use a percentage difference instead of absolutes.
We have considered this and are open to it 👍