docker-galaxy-stable icon indicating copy to clipboard operation
docker-galaxy-stable copied to clipboard

Python Library for configure_slurm.py.

Open jmchilton opened this issue 7 years ago • 6 comments

It'd be nice if we had a uniform set of variables and such for dealing with this inside and outside of Ansible as well as inside and outside of Docker (e.g. the original place this script was developed I think was Pulsar testing years ago - https://github.com/galaxyproject/pulsar/blob/master/scripts/configure_test_slurm.py). And it'd be nice if pip install slurm_configure==<version> was used for version handling across all these projects.

jmchilton avatar Apr 29 '17 21:04 jmchilton

What about ephemeris or ansible-extras?

bgruening avatar Apr 29 '17 21:04 bgruening

@bgruening I don't really like either method - ansible-galaxy-extras isn't a library that can be readily used by Pulsar testing for instance and ephemeris is should ultimately be galaxy-centric and admin-centric I would think. This script is useful outside the context of Galaxy. I get the desire to keep things simple though.

jmchilton avatar May 01 '17 14:05 jmchilton

Ok, makes sense. Under galaxyproject or my account - this will answer the question you or me ;)

bgruening avatar May 01 '17 14:05 bgruening

I was thinking galaxyproject or my account - I was thinking about this as a @jmchilton issue.

jmchilton avatar May 01 '17 14:05 jmchilton

Go for it!

bgruening avatar May 01 '17 14:05 bgruening

I noticed this issue and didn't know where else to post about configure_slurm.py, so I'll post here. On the Galaxy Jetstream image, SLURM can be pretty finicky about getting the hostname right. I've even seen it report having an old IP when the instance is redeployed, e.g. I've seen something like:

root@js-56-78:~# hostnamectl
   Static hostname: js-12-34
Transient hostname: js-56-78.jetstream-cloud.org

Should configure_slurm.py handle this kind of quirk, or does the Jetstream image itself need some additional hostname finagling?

Less Jetstream specific, can I also recommend that configure_slurm.py set up SlurmDBD to keep track of jobs between reboots? It's frustrating to have a job counted as successful because the instance crashed and Galaxy can't find the job in SLURM's history (so it assumes it completed successfully).

chambm avatar Jun 08 '17 18:06 chambm