diffkemp icon indicating copy to clipboard operation
diffkemp copied to clipboard

Use pre-compiled sources in tests

Open viktormalik opened this issue 2 years ago • 3 comments

A large portion of test runtime is spent on building kernel sources into LLVM IR. This doesn't add much value to the tests b/c the building has very few different scenarios and we're testing it (at least the build-kernel command) separately anyways. To speed up the tests, it'd be better to use pre-compiled LLVM IR files. It also wouldn't require the CI to repeatedly download necessary kernels (which is another slowdown).

The problem is that we'll need to keep quite a large amount of LLVM IR files for a wide range of LLVM versions inside the repository. An alternative would be to keep the pre-built files outside of the repo (e.g. in a separate one), possibly in a compressed format.

viktormalik avatar Oct 20 '23 18:10 viktormalik

I'll check what's the amount of LLVM IR files that we'd need to keep in the repo.

viktormalik avatar Oct 20 '23 18:10 viktormalik

We might try to utilize git-lfs, it is an extension of git used for storing large files. For examples some open-source video games use it to store their assets (which might come gigabytes, just like our case).

That way we could keep most (or even all) LLVM IR files without risking missing some change in behaviour.

DanielKriz avatar Oct 25 '23 21:10 DanielKriz

We might try to utilize git-lfs, it is an extension of git used for storing large files. For examples some open-source video games use it to store their assets (which might come gigabytes, just like our case).

That way we could keep most (or even all) LLVM IR files without risking missing some change in behaviour.

That was my first idea, too, but unfortunately it seems that it won't be suitable. We have ~500MB of data per LLVM version which at this moment is ~4GB overall. The problem is that GitHub limits the LFS size to 5 GB with silently dropping uploaded files exceeding the limit and without an option to remove old stored assets. Even if we managed to reduce the size, I'm afraid that we'd run out of space quickly with just several updates of the testing data.

Instead, my current idea is to use GitHub Actions caching for the tests/regression/kernel_modules directory. If we were able to detect that each LLVM version has the data cached, we wouldn't have to download the kernels in the CI. Caches are stored for 7 days (when unused) so it'd be seldom necessary to rebuild the data. On the other hand, it'd still be easy to do so thanks to the Nix environment and the testing data wouldn't have be stored in the repo.

viktormalik avatar Oct 26 '23 04:10 viktormalik

I finally made some progress with this. I've prepared a refactoring of regression tests which will allow to run regression tests solely on the data from the tests/regression/kernel_modules directory (I'll also rename it to tests/regression/test_data). Now, however, I'm considering two options how to proceed w.r.t. CI:

  1. Checkout the test_data directory into the Git tree. That way, CI wouldn't need to download or cache anything and tests would run solely on the data stored in the repository which is a good testing practice. The disadvantage is that we'd need to have a copy of the test_data directory for each LLVM version support. Currently, the size for one version is ~635 MB so we'd need to cut it down (which should be doable by storing .bc files instead of .ll and deduplicating some of the data but I'm not sure what the resulting size would be). Another problem is that we'd have to remember to update the data every time we do a change in the build pipeline to avoid problems as described in #321.
  2. The other option would be to cache the test_data directory in the CI. This is described in the previous comment, we'd cache it for every LLVM version and only download the kernels if any of the caches expired. The disadvantage is that we could still hit issues with the cache size (see #337) which we could also resolve by cutting down the size of the data.

Any opinions on these? Did I miss some other option? Do you see any clear advantage or blocker?

@FrNecas @lenticularis39 @PLukas2018 @DanielKriz @zacikpa

viktormalik avatar Jul 03 '24 15:07 viktormalik

FWIW, using .bc files instead of .ll would reduce the size to ~200 MB per LLVM version. We could then compress it with xz to some ~40 MB so that's some ~400 MB of data for all LLVM versions that we support. Looks like a reasonable size to me for either caching or storing the data directly in the repo in LFS.

viktormalik avatar Jul 03 '24 19:07 viktormalik

Personally, I think I like more the second option (which you implemented) than checkouting the test_data directory to repo:

  • If we do change the build process we do not have to rebuild the LLVM manually and recommit it to repo.
  • This way at least in some CI runs (when the cache is not possible to restore) the CI will build LLVM IR files for multiple kernel versions.

PLukas2018 avatar Jul 08 '24 16:07 PLukas2018

As is mentioned in the pull request, the size of cached LLVM is lower, thus the issue with size of CI cache shouldn't be a big problem and test should be deterministic enough. I am also in favor of second option, which is already implemented.

DanielKriz avatar Jul 09 '24 07:07 DanielKriz

  • This way at least in some CI runs (when the cache is not possible to restore) the CI will build LLVM IR files for multiple kernel versions.

Agreed. I'm also thinking about scheduling a cron job (e.g. once a week?) which will do rebuild the caches and check that the CI passes on freshly downloaded kernels.

viktormalik avatar Jul 09 '24 07:07 viktormalik

I'm also thinking about scheduling a cron job (e.g. once a week?) which will do rebuild the caches and check that the CI passes on freshly downloaded kernels.

Looks like we could use/add schedule event to the CI and e.g. if the workflow is run because of the schedule event then not to restore the caches and download the kernels instead.

PLukas2018 avatar Jul 10 '24 08:07 PLukas2018

I'm also thinking about scheduling a cron job (e.g. once a week?) which will do rebuild the caches and check that the CI passes on freshly downloaded kernels.

Looks like we could use/add schedule event to the CI and e.g. if the workflow is run because of the schedule event then not to restore the caches and download the kernels instead.

Yeah, that was my rough idea. Thanks for the link, let's add it in a follow-up PR (and maybe discuss in person beforehand).

viktormalik avatar Jul 10 '24 08:07 viktormalik