pyani icon indicating copy to clipboard operation
pyani copied to clipboard

Add SLURM support

Open widdowquinn opened this issue 3 years ago • 10 comments

Summary:

pyani currently only supports SGE/OGE-like systems, but SLURM is very popular. It would be useful to support SLURM scheduling.

widdowquinn avatar Aug 03 '20 16:08 widdowquinn

Hi I have been working on getting SLURM support for pyani and it seems to be working now. I have forked pyani and worked on this repo : https://github.com/TeamMacLean/pyani Basically, i made a copy of run_sge.py and renamed as run_slurm.py and made changes to the job submission commands. I have also renamed some command option like sgegroupsize to groupsize.

TSL-RamKrishna avatar Jan 20 '21 13:01 TSL-RamKrishna

Many thanks @TSL-RamKrishna - that's great!

As it happens I've just got access to a SLURM cluster, so I can actually try this out.

widdowquinn avatar Jan 20 '21 18:01 widdowquinn

Thanks. Let me know when you try that out. I have push updated the repo today as well.

TSL-RamKrishna avatar Jan 21 '21 12:01 TSL-RamKrishna

The changes have now been pulled into pr_236 to be brought in line with existing tests, CLI expectations, and for further development.

@all-contributors please add @TSL-RamKrishna for code, ideas

widdowquinn avatar Jun 19 '21 12:06 widdowquinn

@widdowquinn

I've put up a pull request to add @TSL-RamKrishna! :tada:

allcontributors[bot] avatar Jun 19 '21 12:06 allcontributors[bot]

Is the branch https://github.com/widdowquinn/pyani/tree/pr_236 still in progress and the best bet if I urgently needed to run pyANI under SLURM? Is it worth making that into a new pull request to garner feedback on?

peterjc avatar Oct 19 '21 12:10 peterjc

Tagging @widdowquinn to make sure he sees this sooner.

baileythegreen avatar Oct 19 '21 12:10 baileythegreen

It is still in progress.

The current status is that the pr_236 branch will run on SLURM. However, we need to refactor how we aggregate jobs within pyani because SLURM counts each task within a batch as a single job so that - for example - a 200 genome comparison batched into 10k lumps thus counts as 40,000 jobs and not 4 (x10,000) jobs.

The usual setting of the maximum number of submittable jobs on a SLURM cluster is not much higher than this, which greatly limits the potential for scaling on a SLURM cluster.

The changes required to how we handle jobs in the backend are significant and probably best handled in concert with other changes we have planned. It's a much bigger job than just a drop-in replacement for the old SGE code.

widdowquinn avatar Oct 19 '21 12:10 widdowquinn

Thanks. I suspected the SLURM batching was still an issue from our discussion elsewhere. I'm hoping to use this on up to 500 genomes, so would hit this problem :(

peterjc avatar Oct 19 '21 13:10 peterjc

Aye - I need to make progress on this for my own stuff, but this time of year is unpleasantly busy.

widdowquinn avatar Oct 19 '21 14:10 widdowquinn