ClusterManagers.jl icon indicating copy to clipboard operation
ClusterManagers.jl copied to clipboard

Split up this package

Open andreasnoack opened this issue 7 years ago • 9 comments

There is not much shared code between the managers and most of us only use a single workload/cluster manager so it is difficult to review PRs.

andreasnoack avatar Feb 17 '17 14:02 andreasnoack

That's a good point. Any code that actually is shared should probably be submitted to Base instead of keeping it here.

azraq27 avatar Feb 20 '17 21:02 azraq27

Should the split packages be with individual contributors or under JuliaParallel ? The maintainers of the separate cluster managers ought to be users of the specific manager.

amitmurthy avatar May 19 '17 06:05 amitmurthy

I just created SlurmClusterManager.jl if anyone is interested in giving it a try.

kleinhenz avatar May 21 '20 01:05 kleinhenz

And there's https://github.com/simonbyrne/SlurmTools.jl

kescobo avatar May 22 '20 13:05 kescobo

I just created SlurmClusterManager.jl if anyone is interested in giving it a try.

Requires that SlurmManager be created inside a Slurm allocation created by sbatch/salloc. Specifically SLURM_JOBID and SLURM_NTASKS must be defined in order to construct SlurmManager. This matches typical HPC workflows where resources are requested using sbatch and then used by the application code. In contrast ClusterManagers.jl will dynamically request resources when run outside of an existing Slurm allocation. I found that this was basically never what I wanted since this leaves the manager process running on a login node, and makes the script wait until resources are granted which is better handled by the actual Slurm queueing system.

Oh so much yes! ;)

vchuravy avatar May 22 '20 13:05 vchuravy

We are barely able to maintain a single repository with working versions of the managers. My opinion is that we should unite efforts and collect people with similar skills here to watch out for improvements made to particular managers. Also, from the user's point of view, it is annoying to have a different environment depending on where the script is to be run. Right now we can simply do ]add ClusterManagers and move on.

juliohm avatar Oct 06 '20 19:10 juliohm

@juliohm i disagree, and so do many others i think. my view is that clustermanagers.jl works as is, and so we should leave it be. if we want to make changes, then i would prefer to split it up instead of unifying the code base as you propose in #145. re-opening this issue.

bjarthur avatar Oct 15 '20 11:10 bjarthur

You mean you agree that we should split this package into multiple packages for specific managers @bjarthur?

juliohm avatar Oct 15 '20 11:10 juliohm

Perhaps a common abstract interface should be put in place such that managers can use it? I was looking for SLURM manager, but it's very confusing which one should I use ?

mashu avatar Apr 16 '24 07:04 mashu