ClusterManagers.jl icon indicating copy to clipboard operation
ClusterManagers.jl copied to clipboard

Comprehensive tests

Open vchuravy opened this issue 6 years ago • 7 comments

One of the issues (and one that became even more apparent during the 1.0 transistion) is that it is really hard to test this package. Without CI development is slow since we are likely to break use-cases that we can't test.

Ideally we could use docker environments to instantiate a "minimal" cluster environment in which we then can run tests.

As an example see:

  • https://github.com/giovtorres/slurm-docker-cluster

vchuravy avatar Nov 05 '18 18:11 vchuravy

You may be interested to borrow from the CI set-up (based on docker-compose) we have in dask-jobqueue (deploy dask on HPC clusters). For now, we have CI for SGE, PBS and SLURM.

lesteve avatar Feb 02 '19 16:02 lesteve

Oh that is rather interesting! I see you are running it on travis.

vchuravy avatar Apr 23 '19 19:04 vchuravy

I managed to set up Travis testing on SlurmTools.jl using the docker images built by PySlurm: https://github.com/simonbyrne/SlurmTools.jl/blob/master/.travis.yml

simonbyrne avatar Mar 18 '20 19:03 simonbyrne

I was also able to setup testing infrastructure on travis for SlurmClusterManager using docker-compose to create a small cluster. It seems to work pretty well.

kleinhenz avatar May 31 '20 01:05 kleinhenz

It would be nice to unite these efforts here. We all need cluster managers and yet the current state of affairs is quite fragmented for a reliable experience with testing, etc.

juliohm avatar Oct 06 '20 19:10 juliohm

Status of this? Is there still only slurm testing? Seems like PBS and SGE have been broken for a while: https://github.com/JuliaParallel/ClusterManagers.jl/issues/179

MilesCranmer avatar Aug 30 '23 12:08 MilesCranmer

WIP PR for this: https://github.com/JuliaParallel/ClusterManagers.jl/pull/193

MilesCranmer avatar Aug 30 '23 15:08 MilesCranmer