galaxy icon indicating copy to clipboard operation
galaxy copied to clipboard

Add a SGE CLI job runner

Open mvdbeek opened this issue 6 years ago • 7 comments

All the work as been done by @bring52405, the minor changes from https://help.galaxyproject.org/t/drmaa-library-threads-and-munge-invalid-credential-format/617/24?u=mvdbeek concern the XML parsing.

I haven't actually tested this.

mvdbeek avatar Apr 24 '19 08:04 mvdbeek

I'm missing a call to qacct, which is required on our UNIVA grid engine to get info on finished jobs. Is SGE different in this respect?

Wondering what this is doing: qstat -q IIHG, UI -g d ? Seems to be make to specific assumptions?

bernt-matthias avatar May 13 '19 05:05 bernt-matthias

I suppose many places don't actually deploy qacct ? It's not critical for Galaxy since we continuously monitor the jobs. For a first round this is fine, none of the cli runners use qacct or equivalent.

mvdbeek avatar May 13 '19 06:05 mvdbeek

qstat -q IIHG, UI -g d

I guess that's monitoring some specific queue, you're right. I don't have any SGE to test against unfortunately. If you know of any that can be run in docker we could actually write some tests.

mvdbeek avatar May 13 '19 06:05 mvdbeek

kgutwin/simple-sge seems to work nicely, just need to add ssh to the container.

mvdbeek avatar May 13 '19 06:05 mvdbeek

I would assume that qacct is critical, since its the only way to query the job's result once they are finished. qstat can only query running and queued jobs.

But I guess then only the ability to differentiate jobs that were killed by the grid engine due to time from those that were killed due to memory restrictions.

In theory I could test this on our cluster, but I guess I need to switch from submitting jobs as real user for a start. Is there some example configuration that I could start with?

bernt-matthias avatar May 13 '19 06:05 bernt-matthias

qstat can only query running and queued jobs.

At least on the systems I worked with the jobs stay around for a couple of minutes, and you can see the state then. kgutwin/simple-sge comes with qacct, I can implement this.

In theory I could test this on our cluster, but I guess I need to switch from submitting jobs as real user for a start. Is there some example configuration that I could start with?

I have no experience with this, but running everything as the Galaxy user is the default and should be relatively simple to set up. For now though I can start with making sure this runs against the docker image, I'll ping you once it's not a draft anymore.

mvdbeek avatar May 13 '19 06:05 mvdbeek

Correct, qstat is different on Son of Gride Engine, thus requiring specifying Queues to get the correct parsible output. Reason this runner was created is because SoGE uses the DRMAA runner, but the version we are at 8.1.3 utilizes munge which has some issues with the python-drmaa library.

Wondering what this is doing: qstat -q IIHG, UI -g d ? Seems to be make to specific assumptions?

bring52405 avatar May 13 '19 13:05 bring52405