memilio icon indicating copy to clipboard operation
memilio copied to clipboard

Provide minimal boost version with limited functionality

Open lenaploetzke opened this issue 10 months ago • 4 comments

Feature description

Provide a minimal boost version. Functionality that requires more boost parts than provided should not be built. This currently affects some functions in the file epidemiology/state_age_function.h.

Discussion: Where should the minimal boost version be stored? Ideally not as .tar.gz in the repository as before.

Additional context

No response

Checklist

  • [x] Attached labels, especially loc:: or model:: labels.
  • [x] Linked to project.

lenaploetzke avatar Apr 05 '24 09:04 lenaploetzke

Maybe we can use FetchContent instead of a zip file: https://stackoverflow.com/questions/72913306/how-to-use-boost-libraries-directly-from-github-using-cmake-fetchcontent-or-any

This could also work for the full download as well, by just changing the included libraries. If FetchContent is fast enough, I would prefer it over providing a binary archive. The zip archive obscures the dependency, and as far as I can tell from a quick search the opinion on how to handle dependencies in git(hub) is not to.

Anyways, github does have a solution for large file storage, which we could use for storing an archive.

Further, there is a cache action to reuse dependencies across runs, but I am not sure whether it applies to us, since we do not use a dependency manager like npm.

reneSchm avatar Apr 05 '24 10:04 reneSchm

Maybe we can use FetchContent instead of a zip file: https://stackoverflow.com/questions/72913306/how-to-use-boost-libraries-directly-from-github-using-cmake-fetchcontent-or-any

This could also work for the full download as well, by just changing the included libraries. If FetchContent is fast enough, I would prefer it over providing a binary archive. The zip archive obscures the dependency, and as far as I can tell from a quick search the opinion on how to handle dependencies in git(hub) is not to.

FetchContent is used in #983 and takes quite a long time

lenaploetzke avatar Apr 05 '24 10:04 lenaploetzke

I think that version downloads all of boost, in the stackoverflow link they seem to only download some of the boost libraries. It might be worth a try.

reneSchm avatar Apr 05 '24 10:04 reneSchm

@reneSchm

  • Daniel had been playing around with caching here: https://github.com/SciCompMod/memilio/issues/305
  • GitLFS is for several reasons probably not the right solution. It has to be paid for space and traffic.

mknaranja avatar Apr 05 '24 12:04 mknaranja

I think it would be worth exploring caching boost. there are some open questions though, see https://github.com/SciCompMod/memilio/pull/994#issuecomment-2139102260

In short: boost is huge, and cache space is limited, so unless the cached version can be used for all builds, there is not enough space.

Some projects offer dependencies on a mirror, this would have to be a public server, probably hosted at DLR, would have to talk to IT about that. We could put a minimal boost in different versions there. If we had a server for that we might be able to use it as a remote ccache repository.

dabele avatar May 30 '24 09:05 dabele

In short: boost is huge, and cache space is limited, so unless the cached version can be used for all builds, there is not enough space.

As I see it, the biggest problem of the CI regarding build time comes from downloading all of boost for every build. Could we reduce that time by caching a .tar.gz (or .zip) of that download? We then could extract it before the generation step and point cmake to it. The archive is platform independent, and b2 (which will figure out the platform stuff) should run fast enough to keep it in.

reneSchm avatar May 30 '24 10:05 reneSchm

As I see it, the biggest problem of the CI regarding build time comes from downloading all of boost for every build. Could we reduce that time by caching a .tar.gz (or .zip) of that download?

If that's the biggest issue, that should be easy to solve with a cache. Are we currently cloning a git repo or downloading an archive? Downloading a tar would probably be easier and more efficient, both because it's faster to download one file than thousands with git, and because extracting the archive to get the source code is faster than compressing it for caching.

dabele avatar May 30 '24 10:05 dabele

We are downloading a tagged release:

FetchContent_Declare(boost
    GIT_REPOSITORY https://github.com/boostorg/boost.git
    GIT_TAG boost-${MEMILIO_BOOST_VERSION})

I am pretty sure this just downloads the repo, but FetchContent can use URLs, so we could use the download from the boost homepage: "https://archives.boost.io/release/1.85.0/source/boost_1_85_0.tar.gz"

reneSchm avatar May 30 '24 10:05 reneSchm

I just tried FetchContent with URL. It is quite fast, much faster than with a repository. The source code in the archive also has correct include paths, so bootstrapping doesn't seem to be necessary. I would guess it's also less traffic for github. Is there some drawback to downloading the tar or did we just miss that before? I think we don't need to worry about the cache at all then.

dabele avatar May 30 '24 12:05 dabele

We might have missed it, but can we still use the version number with URL?

reneSchm avatar May 30 '24 12:05 reneSchm

I found this commit (https://github.com/SciCompMod/memilio/commit/52d0303db27b3fc406435be1d305c938b8c6e435) where we switched from archive to repository for Eigen3. But we are still using archive for jsoncpp. So maybe the problem with Eigen3 is just a problem for gitlab or it was temporary.

can we still use the version number with URL?

the archive URL follows a naming scheme (see https://github.com/boostorg/boost/tags) that includes the version number and downloads are available for older tags as well, so that should work.

//Edit: here is the issue regarding eigen3 archive download: https://gitlab.dlr.de/hpc-against-corona/epidemiology/-/issues/467 Not much more info there.

dabele avatar May 30 '24 12:05 dabele