easybuild-framework
easybuild-framework copied to clipboard
add support for --update-lmod-caches (disabled by default) and --compile-lmod-caches (WIP)
@rtmclay: it would be great if you could give this a review, especially w.r.t. the FIXME I have in there
Refer to this link for build results (access rights to CI server needed): https://jenkins1.ugent.be/job/easybuild-framework-pr-builder/1334/ Test FAILed.
Can we look into drop using $LMOD_IGNORE_CACHE=1 globally?
@pforai: OK, I'll try and tackle that bit in here, as discussed... It should boil down to always updating the cache before every build is started. Although this will take a little while, it will avoid having to scan the module tree multiple times when running, since module avail
, which is done for each dependency, etc., will become very fast...
Refer to this link for build results (access rights to CI server needed): https://jenkins1.ugent.be/job/easybuild-framework-pr-builder/1348/ Test FAILed.
@pforai: I took a look, and there's more needed to make sure we can not ignore the cache...
We need to make sure that we can/should update the cache actively being used by Lmod in the environment we're running in. This could be a system cache (where we may not have permissions to to update it), or just our local user cache.
So, it seems like we need the following cmdline options:
-
--ignore-lmod-caches
(enabled by default, to make sure?) -
--update-lmod-caches[=<dir>
(which is there now)
Iff --ignore-lmod-caches
is set to False
, we need to somehow query Lmod which cache it will be using, an try to update it. If that works, we can continue. If not, we need to scream bloody murder and end it before it goes bad somehow (e.g. accidentally reinstalling something because we don't see it in ml av
).
Auto-updating the 'live' Lmod cache is probably also not a good idea in a general sense, since we don't know for which $MODULEPATH
it was built... Although it may be fine to assume that the $MODULEPATH
is correct when eb
is being asked to update the live
cache.
In short, I need to give this more thought, so we should probably stay away from also tackling that in this PR.
@stdweird: please re-review?
Refer to this link for build results (access rights to CI server needed): https://jenkins1.ugent.be/job/easybuild-framework-pr-builder/1349/ Test PASSed.
@pforai: One thing we can do is generate our own Lmod cache in /tmp
, and then configure Lmod to use that cache via $LMOD_RC
. With that in place, we can disable LMOD_IGNORE_CACHE
...
Refer to this link for build results (access rights to CI server needed): https://jenkins1.ugent.be/job/easybuild-framework-pr-builder/1361/ Test FAILed.
This looks O.K. to me. You might want to add support for -D with the spider command as a way to track down Lmod problems when building the spider cache.
the script createSystemCacheFile.sh is part of the standard Lmod distribution. But it has to be modified to be used at each site.
@rtmclay: is there any reason why there is no dedicated option in Lmod to update the cache(s) at a specific location?
@rtmclay what needs customisation? it would be better if EB can just use it instead trying to reproduce it in python (it would be even better if lmod had a lmod --make-cache
option).
@stdweird: the location of the cache and timestamp files, potentially also the $MODULEPATH
for which the cache is built, the luac
being used (which has to match the one Lmod is using), etc.
But, I agree that there should be a dedicated cmdline option in Lmod for doing this. The only reason I'm reproducing it now is because there isn't (yet). If there was, I would use it for sure.
The script createSystemCacheFile.sh can be modified to take arguments but having the lmod command do it is just wrong. I'm not going to take bug reports that updating the system cache fails when the user doesn't have permission to write to that directory. I can modify it when I get back. Or I'll take a PR on that.
Refer to this link for build results (access rights to CI server needed): https://jenkins1.ugent.be/job/easybuild-framework-pr-builder/1369/ Test FAILed.
Jenkins: test this please
Refer to this link for build results (access rights to CI server needed): https://jenkins1.ugent.be/job/easybuild-framework-pr-builder/1370/ Test FAILed.
Jenkins: test this please
Refer to this link for build results (access rights to CI server needed): https://jenkins1.ugent.be/job/easybuild-framework-pr-builder/1375/ Test FAILed.
Refer to this link for build results (access rights to CI server needed): https://jenkins1.ugent.be/job/easybuild-framework-pr-builder/1376/ Test FAILed.
Refer to this link for build results (access rights to CI server needed): https://jenkins1.ugent.be/job/easybuild-framework-pr-builder/1379/ Test PASSed.
Refer to this link for build results (access rights to CI server needed): https://jenkins1.ugent.be/job/easybuild-framework-pr-builder/1412/ Test FAILed.
@boegel Relevance?