Lmod icon indicating copy to clipboard operation
Lmod copied to clipboard

Make 'ml restore' behave the same for broken collections/inconsistent collections

Open blappm opened this issue 7 years ago • 11 comments

Make 'ml restore' behave the same for broken collections/inconsistent collections. At the moment it is possible to load a collection if there are modules missing, but it's not possible to load if a checksum has changed.

If a checksum has changed, the collection is lost, since '--force' is not available. So I've added also the --force parameter to be consistent with the other parts of lmod:

ml restore 'collectionname' --force

blappm avatar Nov 03 '17 16:11 blappm

Thanks for reporting to me that it is possible for a collection to load even if it is missing a module. That is a bug and I'll try to fix it.

As for this PR. it makes me nervous. The point of restore is to have it behave EXACTLY as if the user did the loads by hand. Lmod goes to a lot of trouble to make sure that the restore is correct. I don't think it is a good idea to allow users to get sorta what they asked for.

Why not just force users to rebuild their collection? Comments?

rtmclay avatar Nov 04 '17 15:11 rtmclay

Hi Robert,

Rebuilding the collection is painful. It's much more easier to load the missing parts and the save the collection again. My patch allows this with --force.

We have some modules which change frequently (License Ports) because new licenses are added, and other, similar stuff. With the current behaviour module collections are almost useless since users need to rebuild them by hand - over and over again.

blappm avatar Nov 04 '17 16:11 blappm

@blappm Can't you put the stuff that changes in a separate file and load that in the module? So the actual module file remains unchanged.

wpoely86 avatar Nov 04 '17 18:11 wpoely86

There are other examples: A missing variable in one module. We rebuild the modules on all applications for this software and currently all module collections containing this version can't be loaded anymore. You should really give an option (--force) to the people knowing what they do, and not act as a guardian. IMHO the warning is enough.

blappm avatar Nov 04 '17 20:11 blappm

The design of the collection hash is to only checks for any kinds of load() and prepend_path() and append_path() to MODULEPATH. Any other changes in a modulefile should be ignored. If that is not the case then there is a bug in the way that the collection hash is computed and that needs to be fixed.

Could you give me a modulefile that cause the hash value to changes that doesn't change the loads or changes MODULEPATH? Thanks!

rtmclay avatar Nov 04 '17 20:11 rtmclay

I'll have to try and test this again in this case. The error we get if the modulepath has changed is different and it's missleading too: A notice that the collection hasn't been loaded is not given then, just a 'warning' that one needs to rebuild:

ml restore test Restoring modules from user's test Lmod Warning: The system MODULEPATH has changed: please rebuild your saved collection.

The whole concept about this modulepath in the collections is broken IMHO and makes it very difficult to exchange collections between users. Even if all modules are available but the module paths are not in the same order or one modulepath is twice in the list, loading fails.

blappm avatar Nov 06 '17 10:11 blappm

Why not save the modulepath as well in the collection and try to load the modules with this one ?

blappm avatar Nov 06 '17 10:11 blappm

The principal design criteria for collections is they produce the same environment as if the user did the loads by hand or sourced a shell script. That is their only function. Collections are not designed to be shared between users. How could they be? They depend on what the current $MODULEPATH. Also collections do not add to the current modules. Restoring a collection first purges ALL modules then loads the list.

If a user does:

 $ module use $HOME/my_modules; module load foo; module save bar

Then the bar collection adds ~/my_modules to MODULEPATH and remembers that. That way when a user does "module restore bar" they don't have to first remember to do the module use before restoring the bar collection.

Also what modules get loaded is highly dependent on $MODULEPATH and duplicate directories are a very bad idea.

If users want to share the same set of modules, they can easily create a shell script that does:

module use /a/b/c
module purge  # This is optional
module load a b c

and everybody can source that script. This script is even shell independent!

Lmod does use the remembered MODULEPATH to load the module files. BUT it first checks to see that the old system modulepath and the new one are the same. If they are not, Lmod errors out. This is there to make sure that the restored collection matches what the user would load by hand.

I will improve the error message when the system modulepath are different.

rtmclay avatar Nov 06 '17 16:11 rtmclay

Our users liked to use collections exactly for that: Saveing their working module collection in a way other users (for example in the same group) can use it (of course without manual interactions). The problem is that there is no nice shortcut to do it.

If you like to keep module collections for privat use only, please think then about adding something like ml export, which could create a shellscript with 'ml use, ml purge, ml load ...'. You'd make a lot of people happy, I promise you.

blappm avatar Nov 06 '17 16:11 blappm

@blappm why don't you make it a module instead of a collection? You can use try_load? It is much easier to share then a collection?

wpoely86 avatar Nov 06 '17 19:11 wpoely86

@wpoely86 Not really an option. Our users like to share their ENV for an application, exactly like a collection. If they just use 'try load' they maybe have other stuff loaded which doesn't fit. And they like to save and load those modules as easy as possible. Of course it's possible to do it by hand, but they like that fully automated. At the moment some of them are using 'ml list' piped into sed/awk blabla to create some sort of shell script - but this is ugly.

blappm avatar Nov 07 '17 00:11 blappm