easybuild-easyconfigs icon indicating copy to clipboard operation
easybuild-easyconfigs copied to clipboard

{bio}[foss/2019b,fosscuda/2019b] RELION v3.1.0

Open lexming opened this issue 4 years ago • 31 comments

(created using eb --new-pr)

Depends on ~~#10963~~, ~~#10964~~ and ~~https://github.com/easybuilders/easybuild-framework/pull/3386~~

I used #9096 as reference and made some changes to it

  • added templates to integrate with Torque
  • modified templates for Slurm to just rely on the extra variables support in RELION (this avoids the need to use additional patches, it's not as neat but it's easier to maintain and can be easily adapted by the users)
  • fixed configopts for CUDA capabilities in RELION-3.1.0-fosscuda-2019b.eb

I cannot test the integration with Slurm, somebody with it (@akesandgren ) please test.

Update: made an easyblock to ease the configuration of RELION. This PR depends on https://github.com/easybuilders/easybuild-easyblocks/pull/2274

lexming avatar Jul 12 '20 00:07 lexming

Test report by @lexming SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in this PR) node379.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5 See https://gist.github.com/f27340e6e6a7c72307a3b25a7f840ca6 for a full test report.

lexming avatar Jul 12 '20 00:07 lexming

Test report by @lexming SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in this PR) node125.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, Python 2.7.5 See https://gist.github.com/630ce08721f58b8e9bf8ac53363f2cc8 for a full test report.

lexming avatar Jul 12 '20 01:07 lexming

One drawback with separate cpu and gpu submit files is that the user will have to explicitly change the path to the submit file template in the gui when switching from one to the other. And if one is installed with foss and the other with fosscuda, they have to know how to do this with the full path. This is one of the major reasons for the patch that added the extra code for this.

Also, the removal of "which" around all the commands was done since they caused problems with the generated submit files. Don't quite remember what at the moment.

akesandgren avatar Jul 13 '20 09:07 akesandgren

@akesandgren I did not bother with the which commands because even though it's a dumb thing to do, it does not cause issues on my side. I propose to remove them with a single sed instead of a patch, which is a pain to maintain. See https://github.com/easybuilders/easybuild-easyconfigs/pull/10965/commits/f890a37b126a53e6265d934753163fdf86129eed

Regarding the job templates, switching between the non-gpu and gpu jobs is transparent for our users. Once they load the module, the gui of relion automatically takes the corresponding job template and users can directly access it from the command line through the env var $RELION_QSUB_TEMPLATE, regardless if they are using the foss or fosscuda modules. This has the advantage that users loading the foss module cannot submit jobs to GPU nodes, which we do not want.

lexming avatar Jul 13 '20 12:07 lexming

The template problem comes when using the GUI. The path to the template file is saved in the session files for the relion setup. So when they go from foss to fosscuda the info still points to the cpu-only submit file. (Or am I mixing this up with Scipion??)

akesandgren avatar Jul 13 '20 13:07 akesandgren

@akesandgren you are right, relion does not check the environment for existing projects and all parameters stay unchanged. In our case, if the job template would work in both modules, we would face the issue of users inadvertently submitting jobs with wrong parameters regarding number of nodes, threads, gpus, etc... So I personally prefer that switching modules does not allow submitting jobs out of the box.

However, the templates I submitted with this PR are just examples to be adapted on installation. So I do not mind using the gpu template on both easyconfigs and just set the gpu option to zero in foss. If that would be suitable for you let me know and I'll update the PR.

By the way, can anybody (eg @boegel ) clarify why the test do not like my perfectly working sed command? are regex with backreferences not allowed for some reason?

lexming avatar Jul 13 '20 14:07 lexming

Remember that the preconfigopts gets shipped to /bin/sh, you need to escape them correctly.

akesandgren avatar Jul 13 '20 14:07 akesandgren

The preconfigopts is properly escaped as far as I can tell and the config steps proceeds as intended in my local machine. I think that it has something to do with the tests, I have the impression that escaping a parenthesis \( is protected in the tests but something like \1 is not and gets parsed. But its just an impression.

lexming avatar Jul 13 '20 15:07 lexming

So, we will stick to using the add_hpc2n_submit_file_args.patch.

akesandgren avatar Jul 14 '20 07:07 akesandgren

Also note that the sed cmd you're using doesn't quite match what the patch is doing.

akesandgren avatar Jul 14 '20 18:07 akesandgren

@akesandgren thanks for pointing out the missing changes in sed, those got lost at some point. I also unified the templates for cpu and gpu to ease switching between foss and fosscuda modules as you suggested.

lexming avatar Jul 14 '20 21:07 lexming

The failing tests can be fixed with ~~https://github.com/easybuilders/easybuild-framework/pull/3386~~

lexming avatar Jul 20 '20 08:07 lexming

close/re-open to trigger new test runs now https://github.com/easybuilders/easybuild-framework/pull/3386 is merged...

boegel avatar Aug 14 '20 08:08 boegel

Oh wait, it's not merged yet... 🤦‍♂️

boegel avatar Aug 14 '20 08:08 boegel

@boegel What do you mean? https://github.com/easybuilders/easybuild-framework/pull/3386 is indeed merged

lexming avatar Aug 14 '20 08:08 lexming

@boegel All tests finally passed on this one. It is ready for review.

lexming avatar Aug 29 '20 07:08 lexming

Test report by @lexming SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in this PR) node312.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5 See https://gist.github.com/7b711308b0125c021bf56204544f4e16 for a full test report.

lexming avatar Aug 29 '20 08:08 lexming

Test report by @lexming SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in this PR) node115.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz, Python 2.7.5 See https://gist.github.com/7640ef52f5e532746702a63524e4cb19 for a full test report.

lexming avatar Aug 29 '20 08:08 lexming

Test report by @boegel SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in this PR) node3404.kirlia.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (cascadelake), Python 2.7.5 See https://gist.github.com/7b6e38f760d4e9ace2dee3bab8a31940 for a full test report.

boegel avatar Sep 18 '20 08:09 boegel

Test report by @boegel SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in this PR) node2464.golett.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (haswell), Python 2.7.5 See https://gist.github.com/d952e160dc86e0ec01b5b338ab9ea9b8 for a full test report.

boegel avatar Sep 18 '20 08:09 boegel

Test report by @boegel SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in this PR) node3302.joltik.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), Python 3.6.8 See https://gist.github.com/ae2962cf12708fdc62de3c9f88b44d4e for a full test report.

boegel avatar Sep 18 '20 09:09 boegel

@lexming Since this requires customizations specific to the site, I think it makes sense to add a README in the RELION subdir to explain that?

Longer term (but not necessarily a blocker for this PR), it probably makes sense to create a custom easyblock for RELION that allows to "inject" the required info and avoid having to manually tweak the easyconfig file?

boegel avatar Oct 20 '20 14:10 boegel

@lexming Ping on adding some more documentation in a README?

boegel avatar Dec 07 '20 08:12 boegel

@boegel I explored the option with a README file but it turned out to be too long and still not very straightforward. Main issue is how the extra job parameters have to be declared as environment variables and also added in the job script template, but with different names. So, I went ahead and made an easyblock to simplify this installation. Please check again with https://github.com/easybuilders/easybuild-easyblocks/pull/2274

lexming avatar Dec 09 '20 10:12 lexming

@lexming: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/410373490 Output from first failing test suite run:

Failed to determine merge base (ec: 1, output: ''), falling back to specifying target branch develop
Failed to determine merge base (ec: 1, output: ''), falling back to specifying target branch develop

List of added easyconfig files in this PR: RELION-3.1.0-foss-2019b.eb
RELION-3.1.0-fosscuda-2019b.eb
(skipped conflicts test)
(skipped dep graph test)
======================================================================
ERROR: test__parse_easyconfig_RELION-3.1.0-foss-2019b.eb (test.easyconfigs.easyconfigs.EasyConfigTest)
Test for parsing of easyconfig RELION-3.1.0-foss-2019b.eb
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/easyconfigs/easyconfigs.py", line 1041, in innertest
    template_easyconfig_test(self, spec_path)
  File "test/easyconfigs/easyconfigs.py", line 837, in template_easyconfig_test
    ecs = process_easyconfig(spec)
  File "/opt/hostedtoolcache/Python/2.7.18/x64/lib/python2.7/site-packages/easybuild/framework/easyconfig/easyconfig.py", line 1977, in process_easyconfig
    raise EasyBuildError("Failed to process easyconfig %s: %s", spec, err.msg)
EasyBuildError: "Failed to process easyconfig /home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/easybuild/easyconfigs/r/RELION/RELION-3.1.0-foss-2019b.eb: No software-specific easyblock 'EB_RELION' found for RELION"

======================================================================
ERROR: test__parse_easyconfig_RELION-3.1.0-fosscuda-2019b.eb (test.easyconfigs.easyconfigs.EasyConfigTest)
Test for parsing of easyconfig RELION-3.1.0-fosscuda-2019b.eb
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/easyconfigs/easyconfigs.py", line 1041, in innertest
    template_easyconfig_test(self, spec_path)
  File "test/easyconfigs/easyconfigs.py", line 837, in template_easyconfig_test
    ecs = process_easyconfig(spec)
  File "/opt/hostedtoolcache/Python/2.7.18/x64/lib/python2.7/site-packages/easybuild/framework/easyconfig/easyconfig.py", line 1977, in process_easyconfig
    raise EasyBuildError("Failed to process easyconfig %s: %s", spec, err.msg)
EasyBuildError: "Failed to process easyconfig /home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/easybuild/easyconfigs/r/RELION/RELION-3.1.0-fosscuda-2019b.eb: No software-specific easyblock 'EB_RELION' found for RELION"

======================================================================
FAIL: test_changed_files_pull_request (test.easyconfigs.easyconfigs.EasyConfigTest)
Specific checks only done for the (easyconfig) files that were changed in a pull request.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/easyconfigs/easyconfigs.py", line 805, in test_changed_files_pull_request
    self.assertTrue(False, error_msg)
AssertionError: Failed to find parsed easyconfig for RELION-3.1.0-foss-2019b.eb (and could not isolate it in easyconfigs archive either)

----------------------------------------------------------------------
Ran 11049 tests in 766.607s

FAILED (failures=1, errors=2)
ERROR: Not all tests were successful.

bleep, bloop, I'm just a bot (boegelbot v20200716.01) Please talk to my owner @boegel if you notice you me acting stupid), or submit a pull request to https://github.com/boegel/boegelbot fix the problem.

boegelbot avatar Dec 09 '20 11:12 boegelbot

So... where are we on this one? We got a new toy[1] and I'm trying to get it loaded up for the users.

[1] https://hprc.tamu.edu/wiki/Grace:Intro

JackPerdue avatar Jan 06 '21 00:01 JackPerdue

@JackPerdue flexing with your new toys is not nice, now I'm jealous!

Regarding this PR, if you want to test it, you just need to use the easyblock for RELION in https://github.com/easybuilders/easybuild-easyblocks/pull/2274. I just opened https://github.com/easybuilders/easybuild-easyblocks/pull/2302 to fix the failing tests, but that is not really needed to use the easyblock.

lexming avatar Jan 06 '21 09:01 lexming

Test report by @lexming Using easyblocks from PR(s) https://github.com/easybuilders/easybuild-easyblocks/pull/2274 SUCCESS Build succeeded for 2 out of 2 (2 easyconfigs in total) node366.hydra.os - Linux centos linux 7.9.2009, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5 See https://gist.github.com/ae7233dbc840b39071270d2ae6fcae93 for a full test report.

lexming avatar Jan 06 '21 13:01 lexming

@lexming Thanks for the info. It installed without problem on the "old" cluster and the new. I'll leave for the user to test and provide us some feedback.

Any desire whatsoever to add support for IBM's LSF? Our "older" cluster which is due to head out the door in the next semester or two used it (it came with the IBM cluster... wasn't our choice). Looks like it would be pretty easy to add to your easyblock if there was any interest.

JackPerdue avatar Jan 07 '21 00:01 JackPerdue

@lexming any interest in pushing this one along?

verdurin avatar Jun 09 '21 17:06 verdurin