hpc-novice
hpc-novice copied to clipboard
Cluster manager specific examples
For the future but long term it'd be great to see examples in all the common cluster managers/schedulers -eg say; slurm, condor, moab, and pbs
@r4space
Rather than examples in many different schedulers/clusters, the proposal is to have a single, central hpc-novice repository written for one specific scheduler/cluster, but general enough that a site may change and adapt the lesson easily and quickly.
For ~80% of the topics that makes complete sense and definitely as a start, but why not 3 or 4 specific flavours in the same way as SWC has eg R and Python modules in the future?
Conceptually they mostly do the same thing, having one plus a translation table like https://slurm.schedmd.com/rosetta.pdf should give people what they need to prepare individual lessons. We just need to be careful not to discuss outside the commonalities of the managers (which shouldn't be hard for an introductory lesson)
I am unclear what to say here as I think the opinions stated so far are all valid.
In my prototype material hpc-in-a-day, I set up the machinery to support SLURM and LSF. While this is all possible with jekyll, it brings in some weirdness in writing the episodes, e.g. check out the code examples in this markdown, as instead of writing it out, you have to load/include a code snippet like so:
A first exercise would be to submit a job that does nothing else but print "Hello World!".
~~~
{% include /snippets/02/submit_hello_world_to_void.{{ site.workshop_scheduler }} %}
~~~
{: .bash}
~~~
Hello World
~~~
{: .output}
Here site.workshop_scheduler
is a parameter that you can set in the projects _config.yml
. But I must say that with some detailed points of usage, the schedulers are different - which makes writing generic text around the code snippets more difficult. But it's possible.
To conclude (apologies for the long post): I personally suggest to discuss putting the technology in the repo of loading snippets for different schedulers, but focus on SLURM for now. With this, we would be ready for any material contributions with scheduler X in the future. Of course, we would have to bite the bullet of a more complicated markdown.
If there's interest, I put together a version of the lesson template that has sections that could be swapped in and out (https://github.com/ChristinaLK/large-scale-chtc)
But I agree with Ashwin -- building up with one example (probably SLURM) is good for focusing development and then we should be able to create some machinery to swap specific commands in and out.
Once this SLURM version is ready, we can re-visit the discussion about whether we want:
- One "official" SLURM lesson, and different sites adapting it as necessary
- Several "official" lesson for different schedulers/setups, similar to SWC's R/Python/MATLAB programming lessons (IMO, this can become unmanageable and lead to out-of-sync lessons)
- One "official" lesson with fancy Markdown to swap commands, and carefully worded text
my gut feeling tells me, that there is a consensus towards going with slurm for starters. fine with me.
Just to clarify: I have no particular affinity for SLURM (I've never used it myself). But I'm fine with it too