ipyparallel icon indicating copy to clipboard operation
ipyparallel copied to clipboard

coordinate with Jupyterhub/BatchSpawner

Open mbmilligan opened this issue 8 years ago • 3 comments

The launcher module here and Jupyterhub's BatchSpawner contain strongly but not completely overlapping knowledge about how to interface with extant job submission systems. I'd like to explore the possibility and desirability of creating a common mechanism to capture this and reduce duplication of effort.

mbmilligan avatar Jul 11 '17 22:07 mbmilligan

I've thought quite a bit about this! I think it's a great idea. JupyterHub's Spawner design is based largely on learning from these Launchers, and I think is a good deal nicer to work with. In particular, baking in the assumption that it's going to launch a notebook server (or somewhat more generally a web service) has been a big benefit over the much more general abstraction of "starts a process" that the IPP launchers were designed on, which has gotten in the way a bit.

The batch templating stuff here is probably useful, though, so a tiny library that wraps the common aspects of job submission that we could use in both would probably be very useful:

Basic functionality we use in both:

  1. generate templates
  2. select queue, limits, nodes, etc.
  3. specify job array / number of tasks
  4. specify command to launch
  5. start/stop a job (array)
  6. check if a job is running

minrk avatar Jul 12 '17 08:07 minrk

Brilliant. To that list I'd add:

  1. report key items about running job state (at minimum identify destination node, but things like remaining walltime might be nice too)

Thinking about the structure of such a thing, it makes sense that this library would still be based on traits, since much of the utility of Batchspawner comes from the power to apply site-specific tweaks without having to fork the code and maintain a site-local module. But it would be more broadly useful if there is some reasonable way to use it in applications that are not otherwise using traitlets. (E.g. I've now seen a couple of examples in the wild of people building a Jupyter widget that dispatches jobs in a way that is utterly tied to the particular cluster they work on.)

It seems there are two basic approaches we could take.

Option 1:

BatchThings.py:

class TorqueThing:
    define_some_interface

class SlurmThing:
    define_some_interface

BatchSpawner:

class TorqueSpawner(BatchSpawnerBase,TorqueThing):
    pass

IPP/launcher.py:

class TorqueLauncher(BatchLauncher,TorqueThing):
    pass

Option 2: no inheritance tricks, but instead BatchSpawner or launcher.py would do something like

B = BatchThing.get_thing(which_one, config_parameters)
B.run_cmd()
B.check_status()

Thoughts?

mbmilligan avatar Jul 12 '17 22:07 mbmilligan

I think option 2 is probably better in the long run, especially if it is to be used outside these to specific applications.

It's still okay to use traitlets / Configurable in the classes if you like, but I'd probably preserve a "has a" relationship in the Spawner/Launcher rather than "is a" via inheritance. If traitlets don't seem like a good idea, then we can always handle the config -> options transform in our Launcher/Spawner implementations.

minrk avatar Jul 14 '17 08:07 minrk