adaptive Blocking behavior of the runner

(original issue on GitLab)

opened by Anton Akhmerov (@anton-akhmerov) at 2017-11-13T14:39:40.396Z

Right now the runner is stopped when the user launches anything that blocks the kernel, and that is dangerous in the context of hpc. Say, a single %debug will halt the cluster computations until the user finishes debug. We should revisit this behavior and think about providing safeguards.

Dec 19 '18 17:12 jbweston

originally posted by Joseph Weston (@jbweston) at 2017-11-13T17:25:12.648Z on GitLab

Getting rid of this restriction is going to be tough.

At the moment the runner and the kernel are able to run in the same thread through use of cooperative multitasking (i.e. coroutines). This makes it trivial to be able to access the learner from the kernel while the runner is doing its job, because we know that the kernel may only run when the runner is awaiting soming, at which time the learner is in a well-defined state.

As you mention above, the disadvantage of cooperative multitasking is that if one coroutine refuses to yield control (a blocking kernel, say) then no other coroutines can work (the runner cannot advance). If you want to lift this restriction, then you have to use another mechanism for controlling access to the shared resource (the learner). Experience tells us that this needs to be done carefully

Dec 19 '18 17:12 jbweston

originally posted by Anton Akhmerov (@anton-akhmerov) at 2017-11-14T08:08:32.017Z on GitLab

I was rather thinking along the lines of reducing the communication channels to the runner by offloading it to a separate process. This would of course restrict our capacity to interact with the learner.

Dec 19 '18 17:12 jbweston

originally posted by Joseph Weston (@jbweston) at 2018-02-19T12:07:53.750Z on GitLab

We now have a BlockingRunner that blocks the kernel. Is this good enough?

Dec 19 '18 17:12 jbweston

originally posted by Bas Nijholt (@basnijholt) at 2018-02-19T12:10:58.076Z on GitLab

I would say yes.

However, you do mention an issue that is still there. I think it's wise to add some more explanation about what happens where. That the function that is learned is executed in the executor and the learner methods are called in the same thread as the notebook. So blocking the notebook thread means the learner can't suggest new points to the executor.

Dec 19 '18 17:12 jbweston

originally posted by Joseph Weston (@jbweston) at 2018-02-19T12:34:22.159Z on GitLab

For now we can do something like:

def _run(learner, *args, **kwargs):
    BlockingRunner(learner, *args, **kwargs)
    return learner

def run_in_background(learner, *args, executor=None, ioloop=None, **kwargs):
    return ioloop.run_in_executor(executor, _run, learner, *args, **kwargs)

We won't be able to interact with the learner; we'll only be able to cancel and check/get the result, but this is fine for v0.1

Dec 19 '18 17:12 jbweston

originally posted by Joseph Weston (@jbweston) at 2018-02-19T12:41:28.540Z on GitLab

Ah, but this won't quite work.

The executor in which we want to run _run is completely independent from the executor in which we want to run the BlockingRunner.

It is not clear to me how we can get this context into a subprocess without resorting to hacks like passing a string to _run that will then be execd (importing the necessary modules and instantiating an executor)

Dec 19 '18 17:12 jbweston

originally posted by Joseph Weston (@jbweston) at 2018-02-20T15:31:33.320Z on GitLab

demoting this issue from milestone 0.1

Dec 19 '18 17:12 jbweston

adaptive adaptive copied to clipboard

Blocking behavior of the runner

(original issue on GitLab)

adaptive
adaptive copied to clipboard