easybuild-easyconfigs icon indicating copy to clipboard operation
easybuild-easyconfigs copied to clipboard

{chem}[foss/2020a,foss/2020b] GULP v6.0

Open sassy-crick opened this issue 3 years ago • 18 comments

(created using eb --new-pr)

sassy-crick avatar Aug 01 '21 14:08 sassy-crick

I'd suggest to not include GULP-GCC and the FFTW-serial. Otherwise this will be a major pitfall for users: Loading the GULP-GCC module and any module from the foss level will mean that 2 different FFTWs are loaded.

Similar in HMNS: GULP now exists at the GCC and the foss level. Not sure how this gets presented to the user, but I guess at least loading foss at some point will switch the GULP version.

Flamefire avatar Aug 05 '21 13:08 Flamefire

@Flamefire The serial version of GULP is what most users in the community are using, so that is needed. I don't know if the serial code is different from the parallel one in the case of GULP. Either way, we need the serial version. The parallel one is just the 'freebee' so to speak. I don't know anybody who will load GULP with anything else when running a calculation. So a clash here is more hypothetical. The switch to the golf toolchain means I get the variables I need to for the patchfiles. For the serial code we don't need MPI so using foss will be an overkill. Besides, if somebody really needs to use the non-serial version of FFTW3, they still can load the parallel version of GULP and run it on a single core. So I don't see much, if any, problems here. Not everything is working better when run in parallel. Most GULP runs are actually rather short, compared to other Chemistry code. @Micket I have updated the patch-files as discussed. In order to have on file for the serial and parallel code, I switched from the GCC to the golf toolchain, so I get the variables I need.

sassy-crick avatar Aug 05 '21 13:08 sassy-crick

The serial version of GULP is what most users in the community are using, so that is needed. Not everything is working better when run in parallel.

I am not proposing to force anyone to use GULP running in parallel. GULP needs FFTW and we have FFTW (only) at a level where OpenMPI happens to be there too. Including another FFTW in the hierarchy creates actual problems so unless the GULP build at the gompi or foss level has any real downside which cannot be avoided in other ways I would rather say the adding yet another FFTW is what is overkill. There is no real downside in reusing what we already have.

I see that you build the foss GULP with ./mkgulp -f -m -c gfortran while the other is missing the -m flag. So I guess the -m enables the parallel version. So if users really only want the serial version, drop that flag and we are all good, aren't we?

We have other cases where we have a serial and parallel version of a software and we usually suffix the name there. So we would have GULP/6.0-... and GULP/6.0-...-parallel. This means users can load either the serial or the parallel version. With the current approach this is not really possible because in HMNS LMod will (IIRC) auto-switch from one to the other.

I don't know anybody who will load GULP with anything else when running a calculation. So a clash here is more hypothetical.

Fully disagree here. Users do what users do and that often is not what we expect. We usually take great care in EasyBuild to avoid such situations as much as possible and here I haven't seen a reason why we can't avoid it. But I'll let the maintainers decide on that one.

Flamefire avatar Aug 05 '21 14:08 Flamefire

GULP needs FFTW and we have FFTW (only) at a level where OpenMPI happens to be there too

Not quite. There is still the golf toolchain for exactly that purpose. That is the reason why I switched from GCCover to golf Your comment any my upload happened at more or less the same time.

sassy-crick avatar Aug 06 '21 15:08 sassy-crick

Ok I see. Got to admit that I'm not familiar with the golf toolchain and don't really see its purpose: I've just rebuild and checked the GCC-FFTW and the gompi-FFTW and besides the libfftw3_mpi* libs they are virtually the same. So for "serial" builds one could just use the gompi-FFTW and not (explicitly) link against fftw-mpi as GULP already does without the -m build flag

My suggestion:

  • Use a different name for serial and parallel GULP, e.g. GULP and GULP-parallel (you said the serial version is what people use by default) or at least use a version-suffix so users know what they get (IIRC in HMNS you don't see the toolchain and e.g. FFTW uses the same scheme)
  • Maybe put GULP (serial) on foss level to avoid having to add the other 2 ECs as the golf toolchain seemingly isn't really used. Just to keep things simple.

Maybe @Micket or @boegel could help clear things up here especially regarding the golf toolchain. Also ccing @akesandgren as he moved PLINK from foss to golf in #11552 which is basically the only golf EC we have (the other is Python-3.6.4-golf-2018a.eb) I'm not sure what the advantage is here.

@sassy-crick Please don't get my nagging wrong here. I think this is a prime example (serial+parallel version of the same software switchable via configure flag) to discuss how to do such things in general and in the future. So I want to make sure we choose the right path here so that future contributors have something as a reference. :)

Flamefire avatar Aug 09 '21 09:08 Flamefire

Thanks for the useful discussion earlier on. As discussed:

  • I have added the usempi: True to both of the foss EasyConfig files
  • I have removed all the serial builds as we can use the MPI build of gulp in serial as well
  • consequently I have also removed the fftw serial EC files
  • updated both patch files accordingly by removing the -lfftw3_mpi entry as that is now provided by EasyBuild. I think that looks much better now. Thanks for the help!

sassy-crick avatar Aug 17 '21 12:08 sassy-crick

@Flamefire Why does your build fail with the wrong checksum? I get:

$ sha256sum gulp-6.0.tgz b63cff96300356728ee641e7bac8f065a01f108a2289711c3d3d11d4050527f6 gulp-6.0.tgz

I hope that is not one of these annoying checksum problems.

sassy-crick avatar Aug 17 '21 13:08 sassy-crick

Because I had the wrong file there. Rerunning...

Flamefire avatar Aug 17 '21 14:08 Flamefire

Test report by @Flamefire FAILED Build succeeded for 0 out of 2 (2 easyconfigs in total) taurusi8017 - Linux centos linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), Python 2.7.5 See https://gist.github.com/3a73b59db3f00872db60483582b65b17 for a full test report.

Flamefire avatar Aug 17 '21 14:08 Flamefire

Test report by @Flamefire FAILED Build succeeded for 0 out of 2 (2 easyconfigs in total) taurusa5 - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz (broadwell), Python 2.7.5 See https://gist.github.com/bd18b8306bc58c4990bf56b3a6444a23 for a full test report.

Flamefire avatar Aug 17 '21 14:08 Flamefire

Test report by @Flamefire SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in total) taurusi8017 - Linux centos linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), Python 2.7.5 See https://gist.github.com/af6655573807039311ff87e66ee3c46d for a full test report.

Flamefire avatar Aug 17 '21 15:08 Flamefire

Test report by @Flamefire SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in total) taurusa4 - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz (broadwell), Python 2.7.5 See https://gist.github.com/56ff2e5c9d15dfdd0ef73839ccf74981 for a full test report.

Flamefire avatar Aug 17 '21 16:08 Flamefire

Just to catch up on that: it is all now ready to get merged, isn't it?

sassy-crick avatar Aug 22 '21 23:08 sassy-crick

Test report by @jfgrimm FAILED Build succeeded for 0 out of 2 (2 easyconfigs in total) himem06.pri.viking.alces.network - Linux CentOS Linux 7.9.2009, x86_64, Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz (skylake_avx512), Python 3.6.8 See https://gist.github.com/69fd3f016145f856389875ab6cd16e7d for a full test report.

jfgrimm avatar Jun 27 '22 11:06 jfgrimm

@sassy-crick I'm getting a different checksum for gulp 6.0: 7977fdd427430008097edcd17852e0de7ff37877b9717aaea2110127744913b0

jfgrimm avatar Jun 27 '22 11:06 jfgrimm

We both are right. As that dates some time back, they done changes to the tarball.

$ sha256sum gulp-6.0.tgz
b63cff96300356728ee641e7bac8f065a01f108a2289711c3d3d11d4050527f6  gulp-6.0.tgz

new:
$ sha256sum gulp-6.0.tgz
7977fdd427430008097edcd17852e0de7ff37877b9717aaea2110127744913b0  gulp-6.0.tgz

Attached is the diff-file between the two versions. Another case of how not to do things. :-(

I will update the EC. Thanks for finding that, I did not bother to download it again as I had a copy of the source code already from way back.

diff.txt

sassy-crick avatar Jun 28 '22 19:06 sassy-crick

GULP source tarballs keep changing, see also #15984

Time to reach out to the developers and ask to step up their game and use semver?

boegel avatar Aug 08 '22 14:08 boegel

I have updated the checksum to the source tarball I just have downloaded. See also 15984

sassy-crick avatar Oct 17 '22 09:10 sassy-crick

I am closing this PR as a newer, already merged version is out and even more newer versions are available on their website.

sassy-crick avatar Oct 23 '23 11:10 sassy-crick