caliban icon indicating copy to clipboard operation
caliban copied to clipboard

Support Slurm-over-ssh as job manager

Open eschnett opened this issue 5 years ago • 1 comments

I am interested in using Caliban to manage jobs no HPC systems. These typically have a job manager such as Slurm, and are accessed via ssh instead of a web API. Instead of Docker, they provide e.g. Singularity.

This draft pull request is a proof of concept to demonstrate the mechanics of using ssh, Slurm, and Singularity. The code is not yet well structured, and several constants are hard-wired in for our in-house HPC system "Symmetry". At the moment, I am looking for a discussion which functions dealing with Docker images (docker) and/or clusters (platform/cluster) should be generalized, and which should be rewritten in platform/slurm.

I can use this branch to submit code to our HPC system. Other cluster functionality is still missing.

eschnett avatar Jul 20 '20 00:07 eschnett

Codecov Report

Merging #43 into master will decrease coverage by 0.89%. The diff coverage is 32.84%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #43      +/-   ##
==========================================
- Coverage   55.56%   54.67%   -0.90%     
==========================================
  Files          31       32       +1     
  Lines        3180     3316     +136     
==========================================
+ Hits         1767     1813      +46     
- Misses       1413     1503      +90     
Impacted Files Coverage Δ
caliban/platform/slurm/cli.py 31.81% <31.81%> (ø)
caliban/main.py 27.17% <50.00%> (+1.03%) :arrow_up:
caliban/docker/build.py 32.71% <100.00%> (ø)
caliban/util/auth.py 76.19% <0.00%> (+9.52%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4b09430...ce28c78. Read the comment docs.

codecov[bot] avatar Jul 20 '20 00:07 codecov[bot]