hpc-novice icon indicating copy to clipboard operation
hpc-novice copied to clipboard

Cluster manager specific examples

Open r4space opened this issue 7 years ago • 8 comments

For the future but long term it'd be great to see examples in all the common cluster managers/schedulers -eg say; slurm, condor, moab, and pbs

r4space avatar Aug 02 '17 19:08 r4space

@r4space

Rather than examples in many different schedulers/clusters, the proposal is to have a single, central hpc-novice repository written for one specific scheduler/cluster, but general enough that a site may change and adapt the lesson easily and quickly.

shwina avatar Aug 02 '17 19:08 shwina

For ~80% of the topics that makes complete sense and definitely as a start, but why not 3 or 4 specific flavours in the same way as SWC has eg R and Python modules in the future?

r4space avatar Aug 03 '17 02:08 r4space

Conceptually they mostly do the same thing, having one plus a translation table like https://slurm.schedmd.com/rosetta.pdf should give people what they need to prepare individual lessons. We just need to be careful not to discuss outside the commonalities of the managers (which shouldn't be hard for an introductory lesson)

ocaisa avatar Aug 03 '17 05:08 ocaisa

I am unclear what to say here as I think the opinions stated so far are all valid.

In my prototype material hpc-in-a-day, I set up the machinery to support SLURM and LSF. While this is all possible with jekyll, it brings in some weirdness in writing the episodes, e.g. check out the code examples in this markdown, as instead of writing it out, you have to load/include a code snippet like so:

A first exercise would be to submit a job that does nothing else but print "Hello World!".

~~~
{% include /snippets/02/submit_hello_world_to_void.{{ site.workshop_scheduler }} %}
~~~
{: .bash}

~~~
Hello World
~~~
{: .output}

Here site.workshop_scheduler is a parameter that you can set in the projects _config.yml. But I must say that with some detailed points of usage, the schedulers are different - which makes writing generic text around the code snippets more difficult. But it's possible.

To conclude (apologies for the long post): I personally suggest to discuss putting the technology in the repo of loading snippets for different schedulers, but focus on SLURM for now. With this, we would be ready for any material contributions with scheduler X in the future. Of course, we would have to bite the bullet of a more complicated markdown.

psteinb avatar Aug 03 '17 07:08 psteinb

If there's interest, I put together a version of the lesson template that has sections that could be swapped in and out (https://github.com/ChristinaLK/large-scale-chtc)

But I agree with Ashwin -- building up with one example (probably SLURM) is good for focusing development and then we should be able to create some machinery to swap specific commands in and out.

ChristinaLK avatar Aug 03 '17 13:08 ChristinaLK

Once this SLURM version is ready, we can re-visit the discussion about whether we want:

  1. One "official" SLURM lesson, and different sites adapting it as necessary
  2. Several "official" lesson for different schedulers/setups, similar to SWC's R/Python/MATLAB programming lessons (IMO, this can become unmanageable and lead to out-of-sync lessons)
  3. One "official" lesson with fancy Markdown to swap commands, and carefully worded text

shwina avatar Aug 03 '17 14:08 shwina

my gut feeling tells me, that there is a consensus towards going with slurm for starters. fine with me.

psteinb avatar Aug 03 '17 15:08 psteinb

Just to clarify: I have no particular affinity for SLURM (I've never used it myself). But I'm fine with it too

shwina avatar Aug 03 '17 15:08 shwina