adaptive icon indicating copy to clipboard operation
adaptive copied to clipboard

Blocking behavior of the runner

Open jbweston opened this issue 5 years ago • 7 comments

(original issue on GitLab)

opened by Anton Akhmerov (@anton-akhmerov) at 2017-11-13T14:39:40.396Z

Right now the runner is stopped when the user launches anything that blocks the kernel, and that is dangerous in the context of hpc. Say, a single %debug will halt the cluster computations until the user finishes debug. We should revisit this behavior and think about providing safeguards.

jbweston avatar Dec 19 '18 17:12 jbweston

originally posted by Joseph Weston (@jbweston) at 2017-11-13T17:25:12.648Z on GitLab

Getting rid of this restriction is going to be tough.

At the moment the runner and the kernel are able to run in the same thread through use of cooperative multitasking (i.e. coroutines). This makes it trivial to be able to access the learner from the kernel while the runner is doing its job, because we know that the kernel may only run when the runner is awaiting soming, at which time the learner is in a well-defined state.

As you mention above, the disadvantage of cooperative multitasking is that if one coroutine refuses to yield control (a blocking kernel, say) then no other coroutines can work (the runner cannot advance). If you want to lift this restriction, then you have to use another mechanism for controlling access to the shared resource (the learner). Experience tells us that this needs to be done carefully

jbweston avatar Dec 19 '18 17:12 jbweston

originally posted by Anton Akhmerov (@anton-akhmerov) at 2017-11-14T08:08:32.017Z on GitLab

I was rather thinking along the lines of reducing the communication channels to the runner by offloading it to a separate process. This would of course restrict our capacity to interact with the learner.

jbweston avatar Dec 19 '18 17:12 jbweston

originally posted by Joseph Weston (@jbweston) at 2018-02-19T12:07:53.750Z on GitLab

We now have a BlockingRunner that blocks the kernel. Is this good enough?

jbweston avatar Dec 19 '18 17:12 jbweston

originally posted by Bas Nijholt (@basnijholt) at 2018-02-19T12:10:58.076Z on GitLab

I would say yes.

However, you do mention an issue that is still there. I think it's wise to add some more explanation about what happens where. That the function that is learned is executed in the executor and the learner methods are called in the same thread as the notebook. So blocking the notebook thread means the learner can't suggest new points to the executor.

jbweston avatar Dec 19 '18 17:12 jbweston

originally posted by Joseph Weston (@jbweston) at 2018-02-19T12:34:22.159Z on GitLab

For now we can do something like:

def _run(learner, *args, **kwargs):
    BlockingRunner(learner, *args, **kwargs)
    return learner

def run_in_background(learner, *args, executor=None, ioloop=None, **kwargs):
    return ioloop.run_in_executor(executor, _run, learner, *args, **kwargs)

We won't be able to interact with the learner; we'll only be able to cancel and check/get the result, but this is fine for v0.1

jbweston avatar Dec 19 '18 17:12 jbweston

originally posted by Joseph Weston (@jbweston) at 2018-02-19T12:41:28.540Z on GitLab

Ah, but this won't quite work.

The executor in which we want to run _run is completely independent from the executor in which we want to run the BlockingRunner.

It is not clear to me how we can get this context into a subprocess without resorting to hacks like passing a string to _run that will then be execd (importing the necessary modules and instantiating an executor)

jbweston avatar Dec 19 '18 17:12 jbweston

originally posted by Joseph Weston (@jbweston) at 2018-02-20T15:31:33.320Z on GitLab

demoting this issue from milestone 0.1

jbweston avatar Dec 19 '18 17:12 jbweston