dask-jobqueue "Baseline" workers combination with submitted worker jobs

I am working on the Princeton Cluster tigercpu, which presents some challenges to an efficient workflow with dask-jobqueue: Small fast starting jobs are somewhat discouraged, e.g. most of the time the fastest starting jobs have 2 nodes (each with 40 cores) if submitted as batch script. Interactive jobs start faster most of the time.

My current workflow looks something like this:

Request a single node as interactive job (fast), start a Jupyter notebook, start a SLURMcluster, and ssh to this node.
Request another interactive job (more resources, usually fastish when I request an interactive session) where I basically manually execute the jobscript created by the SLURMcluster, connecting to the notebook
Dask away.

This is quite cumbersome. If I substitute 2) by using the actual dask-jobqueue functionality this would be much cleaner, but wait times can be longer. Is there a way to start a few workers on the same node as the scheduler (created in 1)) directly from the SLURMcluster? I guess this is somewhat a combination of LocalCluster and SLURMcluster?

That would be ideal, since it would immediately provide a few baseline workers to explore data and make some preliminary analysis, before the real compute power comes on.

Apr 10 '20 14:04 jbusecke

Yeah I feel your pain, this seems less than ideal.

Just to be sure I follow in 1. and 2. you mention SLURMCluster but you meant actually LocalCluster right ? In other words you are currently creating your Dask cluster by hand and ideally you would like to use Dask-Jobqueue ?

About your particular question

Is there a way to start a few workers on the same node as the scheduler (created in 1)) directly from the SLURMcluster? I guess this is somewhat a combination of LocalCluster and SLURMcluster?

I thought about this too, the use case I have in one cluster is GPU only so if you launch your main script (i.e. the one that creates SLURMCluster) on a node you will get a GPU and will be billed for it, so in an ideal world you would be able to

The idea I had is to use client.run_on_scheduler (see this) and actually run the last line of your cluster.job_script() through this. I haven't had the time to try out this idea, but if you try and it works (or you have problems) let me know! I can not find the issue right now where I mentioned this idea before ...

It is quite hard to give generic instructions in the doc how to "best" (not well defined but convenience and performance are in the mix with probably a bit more emphasis on convenience in my experience) use Dask-Jobqueue and it is very much cluster specific (e.g. the kind of different queues you have, how fast you are likely to get a job in different queues). A reasonable thing to do would be to try to explain your workflow to your cluster IT but my feeling is that it takes time and energy to get them to move (probably also very cluster-specific). I feel finding one motivated sys-admin may help because collectively they may be quite conservative but finding one person that sees your problem and may be motivated to find work-arounds within the cluster constraints can make things move faster.

There is definitely some interesting activity going on in Pangeo and JupyterHub with some US clusters sys-admins being involved (NERSC, NCAR, etc ...). Trying to reach out to them is also an option because they may have some knowledge about cluster tips and tricks from the trenches that I don't know. They may also be able to help you talk to your local cluster IT in words that they understand. Edit: I just see now that you are involved in Pangeo so you probably know that already, oh well I'll leave this info anyway because it may be useful for someone else.

Apr 11 '20 05:04 lesteve

Hi,

Is there a way to start a few workers on the same node as the scheduler (created in 1)) directly from the SLURMcluster?

Just to be sure, in your particular use case of working on only one node, why not use only LocalCluster?

This would look something like:

Request a single node as interactive job (a full node if possible), start a Jupyter notebook,
Open a web browser to the notebook, create a LocalCluster. Scale it or Adapt it for the single node resources.
Dask away.

Apr 11 '20 21:04 guillaumeeb

Oh yes. I was just wondering if I can attach a lot more workers to that existing setup on demand. This would enable me to test out stuff with e.g. smaller slices of datasets (and also start quickly), but scale it up once I need some oompf. I was just wondering if there is a straightforward way to attach workers on other nodes to this existing setup.

Apr 12 '20 00:04 jbusecke

If I had to do it I would try the SLURMCluster + client.run_on_scheduler approach. If this approach actually works, I think it would be interested for a variety of use case (e.g. the GPU case mentioned above)

There will probably more more people with the relevant expertise in the dask/distributed issue tracker who can comment whether this idea may work or not ...

May I suggest:

you open an issue in https://github.com/dask/distributed with a link to this issue
ping me on this issue so I can follow the discussion and potentially comment if I can make the problem clearer (HPC have its quirks and not everyone are aware of them)
suggest the client.run_on_scheduler idea and see whether they think it could work

Apr 12 '20 01:04 lesteve

Thanks for all the advice! This sounds like a great plan. Will try that out next week and report back!

Apr 12 '20 14:04 jbusecke

Closing this issue as stale, and should be fixed once we implement #419.

Another solution would be to manually start a worker from inside the Notebook where SLURMCluster has been used.

Aug 30 '22 06:08 guillaumeeb

Awesome. Thanks. And sorry for the radio silence. I am actually not at Princeton anymore, but will keep these tips in mind the next time I work on an HPC!

Aug 30 '22 19:08 jbusecke

dask-jobqueue dask-jobqueue copied to clipboard

"Baseline" workers combination with submitted worker jobs

dask-jobqueue
dask-jobqueue copied to clipboard