software-layer icon indicating copy to clipboard operation
software-layer copied to clipboard

Minimise initialisation script to only initialise Lmod

Open ocaisa opened this issue 2 years ago • 6 comments

With the compat layer, we can have multiple OSes side-by-side without issue. The only reason we can't right now is because of our init script since it is adding compat layer paths to the environment. I think we should move almost all our init script into a module file, and indeed an initial attempt has already been made in #68.

In the past this was tricky because we needed archspec to determine the architecture, but if our bash only approach (#187) is reliable this would no longer be a restriction (we can use bash from the host). And even deciding the architecture could be done as part of the LMOD_RC in Lua (which may even make us resilient against exported environment variables in a job context).

Then for each compat layers, you can have a gateway module (with an Lmod family so you can't have two loaded at once) to give you access to a particular compat layer (and associated EB stack).

As a major plus, this would also mean we would be able leverage Lmod to work in different shell environments.

The only thing remaining in the initialisation script would be initialising Lmod. It wouldn't even matter which version as we should be able to switch version to match the pilot version as part of the compat module file (if this was considered necessary).

ocaisa avatar Jun 15 '23 13:06 ocaisa

This would also give us a way of documenting the compat layer (and important differences it may have)

ocaisa avatar Jun 15 '23 13:06 ocaisa

In terms of using old software on new hardware, probably the archdetect bash script will have to return multiple values to try, something like x86_64/intel/skylake_avx512:x86_64/intel/haswell:x86_64/generic, and if the path exists under /cvmfs/pilot.eessi-hpc.org/versions/XXX/software/linux it uses it, otherwise it tries the next option. That way we can always use the latest version of archdetect.

ocaisa avatar Jun 15 '23 13:06 ocaisa

I wonder what use cases are not possible with the current approach.

trz42 avatar Jun 16 '23 06:06 trz42

Currently you cannot (automatically) reverse the initialisation, the PATH entries to the compat layer remain as do the additions to the MODULEPATH. This makes it a little dangerous to source multiple compat layers as tools from one may leak into another.

This means in general that we can't reliably mix and match software from different compat layers. Module files (and the use of an Lmod family) would provide a safe and documented way to do this. With that approach we would no longer need to build old software with new compat layers (and deal with the fallout), we can provide a global view which includes all compat layers.

ocaisa avatar Jun 16 '23 07:06 ocaisa

@boegel I just realised that this is even more important. Right now, it is not possible to initialise a different version of EESSI if you have already initialised a version. Our current init scripts assume certain actions if EESSI-related envvars are set in the environment. This means unless you know which variables to unset, you cannot escape the existing EESSI version (so if EESSI were your default environment like it is for me in Magic Castle, you cannot easily try another version).

ocaisa avatar Jun 30 '23 09:06 ocaisa

The Lmod feature source_sh() may be enough for us to figure out the architecture. A simple script containing

export ARCHITECTURE_PATH=$(/home/ocaisa/software-layer/init/eessi_archdetect.sh cpupath 2> /dev/null)

can be created, and a module file for this can be created containing:

source_sh("bash", "/home/ocaisa/test_lmod/script.sh")

which will set ARCHITECTURE_PATH via Lmod as a result.

ocaisa avatar Aug 01 '23 14:08 ocaisa