easybuild icon indicating copy to clipboard operation
easybuild copied to clipboard

module library reload :: correct way to handle different concurent library version

Open EricDeveaud opened this issue 7 years ago • 10 comments

context: we are currently testing//POCing easybuild as our install framework in order to provide scientific softwares to our user.

installing BLAST+ and ABYSS using the same toolchain: goolf-1.4.10.eb they were installed using folowing eb command line.

eb BLAST+-2.2.30-goolf-1.4.10.eb --robot
eb ABySS-1.5.2-goolf-1.4.10.eb --robot

loading both modules at the same time leads to a swap of boost version used see:

bigmess:~ > module load BLAST+
bigmess:~ > module list 

Currently Loaded Modules:
  1) GCC/4.7.2                                  7) ScaLAPACK/2.0.2-gompi-1.4.10-OpenBLAS-0.2.6-LAPACK-3.4.2
  2) hwloc/1.6.2-GCC-4.7.2                      8) goolf/1.4.10
  3) OpenMPI/1.6.4-GCC-4.7.2                    9) bzip2/1.0.6-goolf-1.4.10
  4) gompi/1.4.10                              10) zlib/1.2.8-goolf-1.4.10
  5) OpenBLAS/0.2.6-gompi-1.4.10-LAPACK-3.4.2  11) Boost/1.57.0-goolf-1.4.10
  6) FFTW/3.3.3-gompi-1.4.10                   12) BLAST+/2.2.30-goolf-1.4.10

bigmess:~ > module load ABySS 

The following have been reloaded with a version change:
  1) Boost/1.57.0-goolf-1.4.10 => Boost/1.53.0-goolf-1.4.10

bigmess:~ > module list 

Currently Loaded Modules:
  1) GCC/4.7.2                                                  8) goolf/1.4.10
  2) hwloc/1.6.2-GCC-4.7.2                                      9) bzip2/1.0.6-goolf-1.4.10
  3) OpenMPI/1.6.4-GCC-4.7.2                                   10) zlib/1.2.8-goolf-1.4.10
  4) gompi/1.4.10                                              11) BLAST+/2.2.30-goolf-1.4.10
  5) OpenBLAS/0.2.6-gompi-1.4.10-LAPACK-3.4.2                  12) Boost/1.53.0-goolf-1.4.10
  6) FFTW/3.3.3-gompi-1.4.10                                   13) ABySS/1.5.2-goolf-1.4.10
  7) ScaLAPACK/2.0.2-gompi-1.4.10-OpenBLAS-0.2.6-LAPACK-3.4.2

as stated by the module load message boost was swapped from Boost/1.57.0-goolf-1.4.10 to Boost/1.53.0-goolf-1.4.10

using the same toolchain I was expecting to use the same library versions. I understand that toolchain definition can handle this kind of granularity.

in this case it won't cause any problem as only boost headers are use at build time, no binary are linked to boost libs, so this will be transparent.

but this lead (in my opinion) to some hard to debbug problems

if we assume that soft A ins linked to some lib version X and soft B linked to same lib version Y and Y is incompatible with Y or does not provides same symbols//functions (it may happens with different version of boost)

loading A then B can break A loading B then A will break B

how do you handle such problems ? is there some easybuild "tricks" that will allow to build in an homogenous environment ? should we define our own modulefile to define toolchain + major libraries to use ?

lastly all some eb stuff informations as requested in the doc:

bigmess:~ > python -V
Python 2.7.3
bigmess:~ > type module
module is a shell function
bigmess:~ > type -f module
module () {
        eval $($LMOD_CMD bash "$@")
        [ $? = 0 ] && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
}
bigmess:~ > module --version

Modules based on Lua: Version 6.1  2016-02-05 16:31
    by Robert McLay [email protected]

bigmess:~ > module av EasyBuild

----------------------------------------- /soft/adm/easybuild/modules/all -----------------------------------------
   EasyBuild/2.9.0

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

bigmess:~ > which -a eb
/soft/adm/easybuild/software/EasyBuild/2.9.0/bin/eb
bigmess:~ > eb --version
This is EasyBuild 2.9.0 (framework: 2.9.0, easyblocks: 2.9.0) on host bigmess.it.pasteur.fr.

regards

Eric

EricDeveaud avatar Nov 16 '16 11:11 EricDeveaud

a more detailled example, R-3.3.1 requires zlib-1.2.8 and gmap uses zlib-1.2.7

so loading R then gmap lead to a switch in some

loading R lead to libR.so linked to expected libraries.

bigmess:/soft/src > module list 
No modules loaded

bigmess:/soft/src > module load R
bigmess:/soft/src > ldd /soft/exe/R/3.3.1-foss-2016b/lib64/R/lib/libR.so       
       linux-vdso.so.1 =>  (0x00007f997dbdc000)
       libopenblas.so.0 => /soft/exe/OpenBLAS/0.2.18-GCC-5.4.0-2.26-LAPACK-3.6.1/lib/libopenblas.so.0 (0x00007f997cb7c000)
       libgfortran.so.3 => /soft/exe/GCCcore/5.4.0/lib64/libgfortran.so.3 (0x00007f997ca5b000)
       libreadline.so.6 => /soft/exe/libreadline/6.3-foss-2016b/lib/libreadline.so.6 (0x00007f997ca10000)
       libpcre.so.1 => /soft/exe/PCRE/8.38-foss-2016b/lib/libpcre.so.1 (0x00007f997c9cd000)
       liblzma.so.5 => /soft/exe/XZ/5.2.2-foss-2016b/lib/liblzma.so.5 (0x00007f997c9a8000)
       libbz2.so.1.0 => /soft/exe/bzip2/1.0.6-foss-2016b/lib/libbz2.so.1.0 (0x00007f997c996000)
       libz.so.1 => /soft/exe/zlib/1.2.8-foss-2016b/lib/libz.so.1 (0x00007f997c980000)
       librt.so.1 => /lib64/librt.so.1 (0x00007f997c76a000)
       libdl.so.2 => /lib64/libdl.so.2 (0x00007f997c565000)
       libm.so.6 => /lib64/libm.so.6 (0x00007f997c2e1000)
       libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f997c0c4000)
       libgomp.so.1 => /soft/exe/GCCcore/5.4.0/lib64/libgomp.so.1 (0x00007f997c0a2000)
       libc.so.6 => /lib64/libc.so.6 (0x00007f997bd0e000)
       /lib64/ld-linux-x86-64.so.2 (0x0000003cb7600000)
       libquadmath.so.0 => /soft/exe/GCCcore/5.4.0/lib64/libquadmath.so.0 (0x00007f997bccf000)
       libgcc_s.so.1 => /soft/exe/GCCcore/5.4.0/lib64/libgcc_s.so.1 (0x00007f997bcb7000)
       libncurses.so.5 => /lib64/libncurses.so.5 (0x00007f997ba95000)
       libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007f997b873000)

loading then gmap lead to a swap of some libraries

bigmess:/soft/src > module load GMAP-GSNAP

The following have been reloaded with a version change:
 1) FFTW/3.3.4-gompi-2016b => FFTW/3.3.3-gompi-1.4.10
 2) GCC/5.4.0-2.26 => GCC/4.7.2
 3) OpenBLAS/0.2.18-GCC-5.4.0-2.26-LAPACK-3.6.1 => OpenBLAS/0.2.6-gompi-1.4.10-LAPACK-3.4.2
 4) OpenMPI/1.10.3-GCC-5.4.0-2.26 => OpenMPI/1.6.4-GCC-4.7.2
 5) ScaLAPACK/2.0.2-gompi-2016b-OpenBLAS-0.2.18-LAPACK-3.6.1 => ScaLAPACK/2.0.2-gompi-1.4.10-OpenBLAS-0.2.6-LAPACK-3.4.2
 6) bzip2/1.0.6-foss-2016b => bzip2/1.0.6-goolf-1.4.10
 7) gompi/2016b => gompi/1.4.10
 8) hwloc/1.11.3-GCC-5.4.0-2.26 => hwloc/1.6.2-GCC-4.7.2
 9) zlib/1.2.8-foss-2016b => zlib/1.2.7-goolf-1.4.10

then libR.so show change in libraries it refers to

bigmess:/soft/src > ldd /soft/exe/R/3.3.1-foss-2016b/lib64/R/lib/libR.so 
/soft/exe/R/3.3.1-foss-2016b/lib64/R/lib/libR.so: /soft/exe/GCC/4.7.2/lib64/libgomp.so.1: version `GOMP_4.0' not found (required by /soft/exe/R/3.3.1-foss-2016b/lib64/R/lib/libR.so)
       linux-vdso.so.1 =>  (0x00007f4b46544000)
       libopenblas.so.0 => /soft/exe/OpenBLAS/0.2.6-gompi-1.4.10-LAPACK-3.4.2/lib/libopenblas.so.0 (0x00007f4b4519a000)
       libgfortran.so.3 => /soft/exe/GCC/4.7.2/lib64/libgfortran.so.3 (0x00007f4b44e87000)
       libreadline.so.6 => /soft/exe/libreadline/6.3-foss-2016b/lib/libreadline.so.6 (0x00007f4b44e3c000)
       libpcre.so.1 => /soft/exe/PCRE/8.38-foss-2016b/lib/libpcre.so.1 (0x00007f4b44df9000)
       liblzma.so.5 => /soft/exe/XZ/5.2.2-foss-2016b/lib/liblzma.so.5 (0x00007f4b44dd4000)
       libbz2.so.1.0 => /soft/exe/bzip2/1.0.6-goolf-1.4.10/lib/libbz2.so.1.0 (0x00007f4b44bc3000)
       libz.so.1 => /soft/exe/zlib/1.2.7-goolf-1.4.10/lib/libz.so.1 (0x00007f4b449ae000)
       librt.so.1 => /lib64/librt.so.1 (0x00007f4b44798000)
       libdl.so.2 => /lib64/libdl.so.2 (0x00007f4b44593000)
       libm.so.6 => /lib64/libm.so.6 (0x00007f4b4430f000)
       libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4b440f2000)
       libgomp.so.1 => /soft/exe/GCC/4.7.2/lib64/libgomp.so.1 (0x00007f4b43ee3000)
       libc.so.6 => /lib64/libc.so.6 (0x00007f4b43b4f000)
       /lib64/ld-linux-x86-64.so.2 (0x0000003cb7600000)
       libquadmath.so.0 => /soft/exe/GCC/4.7.2/lib/../lib64/libquadmath.so.0 (0x00007f4b4391a000)
       libgcc_s.so.1 => /soft/exe/GCC/4.7.2/lib/../lib64/libgcc_s.so.1 (0x00007f4b43704000)
       libncurses.so.5 => /lib64/libncurses.so.5 (0x00007f4b434e2000)
       libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007f4b432c0000)

one can notice that libopenblas, libgfortran, libbz2, libz, libgomp, libquadmath, libgcc_s changed.

I must admit that I have not digged enough in order to set up a functional test that emphasis the problem.

my main concern is how do you guys handle this can of problem ?

is this "best effort" as long as some modules can cohabitate it's OK and if "incompatible" modles are loaded at the same time, we teach user to not load those "incompatible" modules.

is there some tricks you set up inplaceto avoid that

does eb have some mechanism that I have (not yet) read in the doc ;-)

thanks for your feedback.

PS for sure I will check version 3.0. with rpath support.

best regards

Eric

EricDeveaud avatar Nov 17 '16 11:11 EricDeveaud

@EricDeveaud This is indeed a problem that you need to take into account.

In theory, you should stick to a single library version for a particular toolchain. In practice, this doesn't really work though, since different software packages have different requirements on dependencies, and you can not predict which two software packages will be used together.

We have tests in place in the easybuild-easyconfigs repository (https://github.com/hpcugent/easybuild-easyconfigs/blob/master/test/easyconfigs/easyconfigs.py#L140) to make sure only a single dependency version is used in the dependency tree for each easyconfig file separately, but we don't enforce single dependency versions across software packages using the same toolchain.

You can check a priori whether two easyconfigs are compatible with each other in terms of dependency versions before installing them both using eb --check-conflicts one.eb two.eb (this is not mentioned in the documentation yet, I'm trying to get to that).

The auto-swapping on a dependency version conflict is done by the modules tool (Lmod), not by EasyBuild or the modules generated by EB themselves. In fact, EasyBuild injects 'conflict' statements in module files to ensure that only one version of each software version is loaded (which is not enforced by old module tools like http://modules.sourceforge.net/, at least not without 'conflict statements). Lmod ensures only one version of each module can be loaded at the same time, the so-called 'one-name-rule', which makes it basically ignore the conflict statement and just unload the currently loaded version and load the new one instead.

We (at HPC-UGent) don't like this behaviour exactly because it can lead to surprises/problems due to incompatible APIs. Therefore, we have disabled the auto-swapping done by Lmod, and spit out an error instead, making it clear that two modules are incompatible if they depend on a different library/tool version. We do this by configuring Lmod with --with-disableNameAutoSwap=yes (see https://github.com/hpcugent/Lmod-UGent/blob/master/Lmod-UGent.spec#L51), and also customise the error message it spits out, see https://github.com/hpcugent/Lmod-UGent/blob/master/SitePackage.lua#L127.

For recent toolchains, we try to stick to a single library version for common libraries (like zlib, libreadline, etc.), but you'll certainly find exceptions to this. In general, we try to anticipate which modules may be used together, and check that they're using the same dependency versions in that case.

Suggestions or ideas to improve on our current way of dealing with this are very welcome!

boegel avatar Nov 20 '16 10:11 boegel

Hi Eric,

I understand very well where you are coming from! These challenges are what made bundles to occur.

Good to bring up the matter because it may end up in a feature request.

In case anybody ever wondered why ABySS couldn't -and wouldn't make it in this bundle, here we go: https://github.com/hpcugent/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/h/HPCBIOS_Bioinfo/HPCBIOS_Bioinfo-20130829-goolf-1.4.10.eb

After exhaustive search, I concluded at that time that it was impossible to make that happen. Boost itself really doesn't follow semantic versioning (http://semver.org/) , therefor replacing 1.53 for 1.(53+n) is an unsafe operation, at least in the generic case.

The same applies for GCC btw, and some more software - luckily the minority.

That implies, that the "uncontrolled" swap done by Lmod would be clearly undesired in such a case, although the act might be tolerated for any software that actually follows semantic versioning (to be clear, that policy would be HPC site specific - as Kenneth's response implies)

Had we a flag to clearly demarcate software that is not semantic-version-safe, we could help this a bit. IMHO, if this feature sounds reasonable it may end up as request for EasyBuild and Lmod - but I have to admit, there are other priorities in between, fi. to only rely on modules built with same EB version.

Now, setting ABySS aside & assuming now its dependency tree can be reconciled with the rest of the bioinformatics packages, what is missing to draw up a common set of dependencies. The case is, the versions of the common deps happen to be toolchain-specific and domain-specific (center of weight can be moved according to which are the popular packages). Here is one example of such debate: https://github.com/hpcugent/easybuild-easyconfigs/issues/1689 That discussion in itself, is reincarnation of biodeps collections going from goolf/1.4.10 to goolf/1.7.20: https://github.com/hpcugent/easybuild-easyconfigs/blob/master/easybuild/easyconfigs/b/biodeps/biodeps-1.6-goolf-1.4.10-extended.eb

The most tricky part is coming up with a list of common deps which work well for all software of interest (the usual suspects to be worried of are visible on the links above) and stick to it for a wide set of software builds which won't collide somewhere in the dependency tree. Easier said than done, because sometimes the collisions are in binaries provided via $PATH - tricky to detect.

In case you wish to invest time in this to come with wide bundles of software (and/or compatible sets), I'd be more than happy to help you with technical advice. If you are for goolf/1.7.20, I'd recommend picking up from Thekla's work visible here: https://github.com/hpcbios/HPCBIOS-easyconfigs If you are into other toolchains be prepared for exploration rounds - but fear not.

Biweekly technical calls might be a good moment to revive this topic, if it doesn't move on its own.

F.

fgeorgatos avatar Nov 20 '16 21:11 fgeorgatos

Thanks for those instructive answer.

managing a lot of software is really a headache world. ;-)

here we provide more than 400 scientific software packages + ≈100 development tools (compiler, interpreter, libraries and so on) to our users that cover more than 700 modulefiles with different versions. (you can have a look of what we provide over there: http://bioweb.pasteur.fr)

auto-swapping wa really asurprise for us when we digged on easybuild POC.

we managed to compile all software with rpath included, in order to allow softs linked to incompatible different libraries and//or compiler versions to be loaded at the same time. NB our policy is to avoid, as much as possible, LD_LIBRARY_PATH export. some specific cases mostly commercial binaries requires LD_LIBRARY_PATH, in this case we wrap this binaries in order to export it at exec time. (wrapper export flags and then exec). this ensure we have an exposure to LD_LIBRARY_PATH changes restricted to the binary execution.

dealing with rpath is made (quite) easy with export of LD_RUN_PATH, LIBRARY_PATH. using this approach one may be granted to link to the relevant libraries. note that builder must take care of upstream build rules in order to track upstream inclusion of rpath that reset ld search mechanism.

our goal is to tend to minimize linking to system libraries in order to have the smallest possible system footprint on the compute nodes.

software builders should discriminate dependencies in 2 ways BUILD dependencies and RUN dependencies.
in my opinion BUILD dependencies may be whatever (gcc, cmake , libraries) for a given software, and this one must be, if needed, hard linked to given dependencies. RUN dependencies are other software that are requested at run time in order run the target software or to compute results (eg openmpi, or some other third party soft of a given pipeline)

IMHO from the user point of view loading a software package should be transparent and should not involve his BUILD dependencies. rpath allow that and ensure A and B can coexist even if they are built using different tool-chain.

regards

Eric

EricDeveaud avatar Nov 21 '16 10:11 EricDeveaud

EasyBuild already has the runtime and buildtime dependency distinction: http://easybuild.readthedocs.io/en/latest/Writing_easyconfig_files.html#dependencies

I wouldn't be including libraries as build dependencies though, since they need to be available at runtime for ld to find them. rpath-ing can solve this though and there has been discussion on introducing rpathdependencies as well as build dependencies.

LD_RUN_PATH can solve some problems (and is something we discussed) but LD_LIBRARY_PATH takes precedence so you would need to to be starting from a clean slate to make that work as expected (i.e., that approach is not backwards compatible). Using (the deprecated!) rpath method explicitly without RUNPATH being set is the only one that gives you complete and guaranteed control.

ocaisa avatar Nov 21 '16 10:11 ocaisa

Actually, LD_RUN_PATH is backwards compatible but probably isn't doing what you want unless you have a clean slate and no LD_LIBRARY_PATH usage

ocaisa avatar Nov 21 '16 10:11 ocaisa

Hi Eric,

just in case noone mentioned this already: Even a compiler can be a runtime dependency, fi. any GCC beyond 4.7 provides libquadmath, which would ​make several builds brake in its absence; R users are often the first to find that.

Truth be told: when you do rpath, you no longer have to load the underlying gcc module, at all; however, when you go rpath way you don't need to load any other dependency for that matter.

And when you do that, things work for what you wish to run next, but not necessarily for want you want to build next! fi. invoking R.install() for R extensions without the proper environment, will bring you to very risky territory - since a compilation step is eventually involved. If you use EB processes for the builds, exclusively, you are somewhat protected in this respect. But if not...

My take is, use rpath judiciously where it fits as a concept.

F.

fgeorgatos avatar Nov 21 '16 11:11 fgeorgatos

@ocaisa sorry I don't get the point about "LD_RUN_PATH is backwards compatible but probably isn't doing what you want unless you have a clean slate and no LD_LIBRARY_PATH usage"

@fgeorgatos good point with the R.install() stuff when compilation is required. one can also note that same risky problem arise with python modules that requires compilation.

thanks for all feedback and valuable information provide here, I really appreciate !

EricDeveaud avatar Nov 21 '16 16:11 EricDeveaud

Sorry I wasn't really clear, what I meant was if you had an existing software installed using EB and starting using LD_RUN_PATH the behaviour would be the same as before because LD_LIBRARY_PATH would take precedence...but that is not what you would really want if you were taking that approach. As an admin, if you ensure that LD_LIBRARY_PATH is not set then things work as expected, which would be the case if you were starting again from scratch.

As you said though, you would need to handle special cases where something internal to the package is happening with rpath, whereas the script approach allows you to handle this directly (and generically) within the script.

Also, it's not fully clear to me if LD_RUN_PATH can handle $ORIGIN, I guess it can.

ocaisa avatar Nov 23 '16 09:11 ocaisa

ok point taken, we fully agree this lead me to open following issue https://github.com/hpcugent/easybuild/issues/282

EricDeveaud avatar Nov 23 '16 13:11 EricDeveaud