easybuild-framework icon indicating copy to clipboard operation
easybuild-framework copied to clipboard

Module wrongly found in HMNS with user-Modulepath

Open Flamefire opened this issue 1 year ago • 8 comments

We have a site-wide software installation and allow custom modules, i.e. --envvars-user-modules

To be able to build on top of that the user needs to modify it's module path. For a base path like /software/modules/all/Core the user needs to do ml use /software/modules/all as otherwise EB won't be able to find modules in "Core". I mentioned this usability issue in https://github.com/easybuilders/easybuild-framework/issues/3703#issuecomment-849395365

However that now leads to a different problem I haven't seen before:

  • I reinstall intel-compilers/2022.1.0 into the user env
  • EasyBuild then claims that e.g. impi/2021.6.0 (impi-2021.6.0-intel-compilers-2022.1.0.eb (module: Compiler/intel/2022.1.0 | impi/2021.6.0) is already installed

However it is only installed in the global tree but not in my user env. Hence I cannot load the module: $ ml intel-compilers/2022.1.0 impi/2021.6.0 fails

More curiously: It does find the impi installed in the global env but only via the full path:

$ ml intel-compilers/2022.1.0
Module intel-compilers/2022.1.0 and 3 dependencies loaded.
[email protected] ~  $ ml av impi

---------------- /software/modules/all -------------------
   Compiler/intel/2022.1.0/impi/2021.6.0

So using the trace output I see that EasyBuild checks that full path: Compiler/intel/2022.1.0/impi/2021.6.0 is already installed (module found), skipping.

However using the current modules that isn't available by the module name "impi/2021.6.0" which EB uses e.g. in the sanity check to load the module and what a user expects.

Flamefire avatar Aug 02 '24 12:08 Flamefire

I wonder if @ocaisa can help out here a bit, I'm not sure I fully grasp the problem here...

Are you basically saying that there's bad side effects of EasyBuild checking for existence of modules with the "full" module name?

boegel avatar Aug 13 '24 21:08 boegel

Are you basically saying that there's bad side effects of EasyBuild checking for existence of modules with the "full" module name?

Yes, that's basically the issue here.

Flamefire avatar Aug 14 '24 12:08 Flamefire

At JSC, they arrange the setup for users to install modules via a module, and set an alias for eb itself: https://github.com/easybuilders/JSC/blob/2024/dev_modules/UserInstallations/easybuild.lua#L89

The difference here seems to be that you are allowing users to define entire toolchains in their custom tree. This is something we created hooks to disallow at JSC. You'd need to create a custom setup to allow the user to install modules that extend the module path which also extend back into the system path. What you've done is shadow the system version which extends in 2 directions with something that only extends the module path in one direction.

ocaisa avatar Aug 14 '24 13:08 ocaisa

I actually just wanted to extend in one direction: If the user installs an own module relevant for HMNS the system modules are gone. This is acceptable especially because current EB doesn't allow an --env-var-user-modules that does NOT take precedence. Otherwise a user could use that

The bug here is that EB finds a module that isn't available given some modules are loaded. Not sure how to fix that without actually loading modules which might not even work when modules "in the middle" went missing. In this case all existing, but now hidden modules would be (re)installed by eb --robot

Flamefire avatar Aug 15 '24 12:08 Flamefire

This seems complicated to me, you need the original module path (/software/modules/all) soeb finds the original compliers etc., then you want to switch to another module tree for the user that doesn't just compliment the original tree.

I think eb is not set up to support this use case, I believe it loops over possibilities in the module path using the full path, so it finds the modules you dont want in the system path. HMNS is the complication as EasyBuild is designed with generic support, this works fine with a single HMNS and an overlay for user installations on top. What you are trying to do is mix two different hierarchies and that isn't really possible with the MNS agnostic implementation. If I was to suggest something, I'd say just rebuild a new module tree leveraging the existing installations...but that is not something you can really say to users.

ocaisa avatar Aug 15 '24 15:08 ocaisa

this works fine with a single HMNS and an overlay for user installations on top.

What is that exactly? There is a single HMNS and the user should be able to install additional modules. So the limitation is that users shouldn't be able to install modules expanding the module path (e.g. compiler)?
I see a related complication: The user installs a compiler (via --robot) that doesn't exist yet, all works. Now the global modules get that compiler and things stop to work.

This is something we created hooks to disallow at JSC.

That is to prevent exactly that, isn't it? Can you share that?

Flamefire avatar Aug 15 '24 16:08 Flamefire

Yes, that's it, users installing things can extend the module path is not allowed. That basically means that they can only install things using known and supported toolchains...but they can install whatever they want with that restriction.

The hook to do this has evolved a lot since I was there, but you can find it at https://github.com/easybuilders/JSC/blob/2024/Custom_Hooks/eb_hooks.py

ocaisa avatar Aug 15 '24 17:08 ocaisa

I see. Currently it prevents user from installing GCCcore or anything not in a list of allowed toolchain names. I guess things like "GCC" or "icc" that also extend the module path were contained there before too

Thanks!

Flamefire avatar Aug 28 '24 11:08 Flamefire