memilio
memilio copied to clipboard
Provide minimal boost version with limited functionality
Feature description
Provide a minimal boost version. Functionality that requires more boost parts than provided should not be built. This currently affects some functions in the file epidemiology/state_age_function.h.
Discussion: Where should the minimal boost version be stored? Ideally not as .tar.gz in the repository as before.
Additional context
No response
Checklist
- [x] Attached labels, especially loc:: or model:: labels.
- [x] Linked to project.
Maybe we can use FetchContent instead of a zip file: https://stackoverflow.com/questions/72913306/how-to-use-boost-libraries-directly-from-github-using-cmake-fetchcontent-or-any
This could also work for the full download as well, by just changing the included libraries. If FetchContent is fast enough, I would prefer it over providing a binary archive. The zip archive obscures the dependency, and as far as I can tell from a quick search the opinion on how to handle dependencies in git(hub) is not to.
Anyways, github does have a solution for large file storage, which we could use for storing an archive.
Further, there is a cache action to reuse dependencies across runs, but I am not sure whether it applies to us, since we do not use a dependency manager like npm.
Maybe we can use FetchContent instead of a zip file: https://stackoverflow.com/questions/72913306/how-to-use-boost-libraries-directly-from-github-using-cmake-fetchcontent-or-any
This could also work for the full download as well, by just changing the included libraries. If FetchContent is fast enough, I would prefer it over providing a binary archive. The zip archive obscures the dependency, and as far as I can tell from a quick search the opinion on how to handle dependencies in git(hub) is not to.
FetchContent is used in #983 and takes quite a long time
I think that version downloads all of boost, in the stackoverflow link they seem to only download some of the boost libraries. It might be worth a try.
@reneSchm
- Daniel had been playing around with caching here: https://github.com/SciCompMod/memilio/issues/305
- GitLFS is for several reasons probably not the right solution. It has to be paid for space and traffic.
I think it would be worth exploring caching boost. there are some open questions though, see https://github.com/SciCompMod/memilio/pull/994#issuecomment-2139102260
In short: boost is huge, and cache space is limited, so unless the cached version can be used for all builds, there is not enough space.
Some projects offer dependencies on a mirror, this would have to be a public server, probably hosted at DLR, would have to talk to IT about that. We could put a minimal boost in different versions there. If we had a server for that we might be able to use it as a remote ccache repository.
In short: boost is huge, and cache space is limited, so unless the cached version can be used for all builds, there is not enough space.
As I see it, the biggest problem of the CI regarding build time comes from downloading all of boost for every build. Could we reduce that time by caching a .tar.gz (or .zip) of that download? We then could extract it before the generation step and point cmake to it. The archive is platform independent, and b2 (which will figure out the platform stuff) should run fast enough to keep it in.
As I see it, the biggest problem of the CI regarding build time comes from downloading all of boost for every build. Could we reduce that time by caching a .tar.gz (or .zip) of that download?
If that's the biggest issue, that should be easy to solve with a cache. Are we currently cloning a git repo or downloading an archive? Downloading a tar would probably be easier and more efficient, both because it's faster to download one file than thousands with git, and because extracting the archive to get the source code is faster than compressing it for caching.
We are downloading a tagged release:
FetchContent_Declare(boost
GIT_REPOSITORY https://github.com/boostorg/boost.git
GIT_TAG boost-${MEMILIO_BOOST_VERSION})
I am pretty sure this just downloads the repo, but FetchContent can use URLs, so we could use the download from the boost homepage: "https://archives.boost.io/release/1.85.0/source/boost_1_85_0.tar.gz"
I just tried FetchContent with URL. It is quite fast, much faster than with a repository. The source code in the archive also has correct include paths, so bootstrapping doesn't seem to be necessary. I would guess it's also less traffic for github. Is there some drawback to downloading the tar or did we just miss that before? I think we don't need to worry about the cache at all then.
We might have missed it, but can we still use the version number with URL?
I found this commit (https://github.com/SciCompMod/memilio/commit/52d0303db27b3fc406435be1d305c938b8c6e435) where we switched from archive to repository for Eigen3. But we are still using archive for jsoncpp. So maybe the problem with Eigen3 is just a problem for gitlab or it was temporary.
can we still use the version number with URL?
the archive URL follows a naming scheme (see https://github.com/boostorg/boost/tags) that includes the version number and downloads are available for older tags as well, so that should work.
//Edit: here is the issue regarding eigen3 archive download: https://gitlab.dlr.de/hpc-against-corona/epidemiology/-/issues/467 Not much more info there.