iree
iree copied to clipboard
Refactor benchmark jobs in ci.yml
Following this discussion on Discord.
These jobs in the ci.yml
workflow are increasing infrastructure complexity and are a bottleneck on total presubmit time:
-
build_benchmark_tools
-
build_e2e_test_artifacts
-
test_benchmark_suites
-
compilation_benchmarks
(opt-in) -
execution_benchmarks
(opt-in) -
process_benchmark_results
(opt-in)
When no benchmarks are enabled, the build_e2e_test_artifacts
-> test_benchmark_suites
jobs still run. The intent here was to verify that benchmark suite models can be successfully compiled and executed correctly even if they are not benchmarked. The reality is that the test_benchmark_suites
workflows are only running one test: iree/tools/test/iree_run_module_correctness_test
, and the build_e2e_test_artifacts
workflow takes 9 minutes to build multiple platform variants of the same program. The coverage provided has redundancies and the overall setup has multiple excess minutes of overhead (not even counting time spent waiting for expensive self-hosted runners.
I propose we
- remove the
test_benchmark_suites
matrix entirely - move
build_benchmark_tools
andbuild_e2e_test_artifacts
to opt-in based on benchmarks being requested (using a preset?) - (as needed) add correctness integration test cases for benchmark programs to
experimental/regression_suite
rather than try to bolt something on to the existing benchmarking infrastructure like those "run module tests" - (as cleanup) eject the benchmark jobs from
ci.yml
and place them inbenchmark.yml
- they can still use presubmit artifacts via GCS storage or GitHub Actions Artifacts. Ideally they could also run on release artifacts
- they can continue to use
configure_ci.py
and PR labels as they currently do
cc @pzread
FWIW, I looked at the logs for a sample run of build_e2e_test_artifacts
to see if there were some easy optimizations to make and it doesn't look like it.
- 4m spent compiling various programs (no huge outlier program like other jobs have had before)
- 1m40s spent uploading outputs
- 1m fetching Python deps from pip
- 40 seconds downloading a Dockerfile
- 10s configuring CMake
- < 5s downloading sources and importing them
just... lots of overhead and then a decent number of programs to compile
SG. I'll say from time to time we catch problems with build_e2e_test_artifacts
, but we can switch and observe failure in postsubmit, then pick the important models into regression_suite. And thanks for looking into these
For the test_benchmark_suites
, @hcindyl might know more about it
I'm seeing if I can pull some statistics for how long each job spends queued right now. (If that ends up taking longer than I'd like I may just proceed without data, but I'd like to know...). For queueing, we could also add more runners, or switch those jobs to use GitHub-hosted runners instead of our own.
I'd rather just eliminate the overhead and complexity instead of throwing more/different resources at it though. Queueing time / job orchestration is only part of the costs.
See https://github.com/openxla/iree/pull/16438#issuecomment-1949078215 for the use of test_benchmark_suites
. It is placed in the CI to check LLVM codegen regression for a few realistic models. I have lost track on when they got turned off in https://github.com/openxla/iree/tree/main/tests/e2e, but historically the test gave us some warnings of the RISCV codegen breakage before it trickles down to the project that uses IREE, e.g., https://opensecura.googlesource.com/sw/vec_iree and https://opensecura.googlesource.com/ml-models-public
Slight adjustments to the plans from the initial issue description:
-
build_benchmark_tools
,build_e2e_test_artifacts
, andtest_benchmark_suites
are now opt-in and will only run when benchmarks are requested (including all postsubmit runs) -
test_benchmark_suites
has some useful coverage again, but those tests should be moved to a regular (non-benchmark) presubmit CI job - likely underexperimental/regression_suite
, butcross_compile_and_test
in ci.yml could also be used in the short term - the benchmark jobs should still be moved out of ci.yml