mars icon indicating copy to clipboard operation
mars copied to clipboard

Conventional schedulers (Slurm/PBS/Loadleveler) compatibility?

Open surak opened this issue 6 years ago • 4 comments

Is your feature request related to a problem? Please describe. Most supercomputers in the world use of the few schedulers available, like the ones mentioned at the title. Those usually don't play well with other schedulers.

Describe the solution you'd like To be able to run it directly from a slurm session.

Describe alternatives you've considered In many SPMD environments, one is able to submit one single program to run in a number of compute nodes. So, this program should be able to run as a master in one node, and as a worker in all the other nodes. Something like a front-end to mars.

Additional context Supercomputer schedulers are quite simple in operation. The user submits a job to the batch system, which waits until the amount of resources requested is available. Then, it runs the code in all the processes. It's that simple.

surak avatar Jan 03 '19 19:01 surak

As current developers of Mars don't have such background or experience on supercomputers. We may need the help from the community. Actually It will be fantastic if you may try to deploy Mars and it's runtime to such an environment, with contributing back the code, we'll appreciate that so much.

qinxuye avatar Jan 04 '19 03:01 qinxuye

Worth taking a look at https://github.com/dask/dask-jobqueue

raybellwaves avatar Jan 06 '19 06:01 raybellwaves

Can we add other scheduler like Slurm here using execution API? @fyrestone

chaokunyang avatar Apr 26 '22 14:04 chaokunyang

Can we add other scheduler like Slurm here using execution API? @fyrestone

It seems that this is a deployment problem, not an execution issue.

fyrestone avatar Apr 27 '22 01:04 fyrestone