enhancement-proposals icon indicating copy to clipboard operation
enhancement-proposals copied to clipboard

Kernel nanny proposal

Open takluyver opened this issue 9 years ago • 95 comments

As discussed at the dev meeting. There are a few TODOs which have not yet been decided. We can bikeshed about them, or whoever gets to implementing the relevant bits first can try their preferred option ;-).

Pinging @JanSchulz, who was interested in this for IRkernel logging.

takluyver avatar Apr 18 '16 16:04 takluyver

Not sure if I understand the details, but currently the notebook isn't very good at shutting down the R kernel on windows, because the R kernel is not a single process, but more like R.exe -> cmd -> rterm.exe [see https://github.com/jupyter/jupyter_client/issues/104]. I'm not sure if the "nanny" can detect such a thing without a heartbeat?

[Such things might happen even for python kernels, if you use a batch file with activate <env> & python <kernel startup line>, which is needed to get the correct PATH in a kernel...]

jankatins avatar Apr 18 '16 16:04 jankatins

I suspect it won't make much of a difference either way in that situation. Both currently and with the kernel nanny, it will send shutdown_request to ask the kernel to shut itself down, and if it doesn't shut down within some time period, it will terminate it more forcefully. I'd guess that second bit is where it goes wrong, since it only knows about the top-level process that it started.

Besides fiddling with the time we wait for the kernel to shut itself down, I'm not sure what we could do to improve that.

takluyver avatar Apr 18 '16 16:04 takluyver

Thanks for writing this up, @takluyver!

minrk avatar Apr 19 '16 09:04 minrk

@takluyver Nicely designed and clearly written. I made a drawing by hand of the frontends, nanny, kernel, and channels; let me know if you would like a copy. 😄

willingc avatar Apr 19 '16 21:04 willingc

Thanks all!

@willingc, yes, it would be good to see your drawing, to check if the explanation conveyed what I was thinking clearly.

takluyver avatar Apr 20 '16 10:04 takluyver

@takluyver Here's the link to the drawing's folder on Dropbox: https://www.dropbox.com/sh/kzc9bom60c9e57x/AAAWcdlGo8RZB9cklEv7jC2ua?dl=0

willingc avatar Apr 20 '16 16:04 willingc

Thanks, that looks good.

takluyver avatar Apr 20 '16 17:04 takluyver

@takluyver Great. You detailed things out very clearly :key:

willingc avatar Apr 20 '16 17:04 willingc

I would suggest that this proposal is split into four proposals:

  1. a proposal to replace SIGINT.
  2. a proposal for kernels to capture the low level stderr and stdout streams and forward them to the frontend.
  3. a proposal to introduce the command jupyter kernel --kernel x
  4. a proposal (specific to IPython) that will use a nanny process to implement the first two proposals.

n-riesco avatar Apr 21 '16 16:04 n-riesco

a proposal (specific to IPython) that will use a nanny process to implement the first two proposals.

This is absolutely not specific to IPython - the proposal is for the nanny process to be used for all kernels.

Your 1 & 2 don't really make sense without a nanny process. 3 is doable, but it's a more incidental benefit. I don't see the benefit of splitting this up into smaller pieces: it's one change to the architecture that lets us do a number of useful things, which seems like exactly the right scope for a JEP.

This was also discussed at the in-person dev meeting, and while I don't want to suggest that it's closed for discussion, we did spend some time hashing out what we wanted, and I'd really hope that the remaining issues to work out are details, not the fundamental nature of the proposal.

takluyver avatar Apr 21 '16 16:04 takluyver

On 21/04/16 17:54, Thomas Kluyver wrote:

Your 1 & 2 don't really make sense without a nanny process.

I think the kernel is in a better position to handle 1 and specially 2 than an agnostic nanny process:

  • a kernel, if really needed, can implement its own nanny process to handle 1 and 2
  • the nanny process cannot determine the origin of stdout and stderr without the kernel's help
  • a kernel can always capture the low level stdout/stderr

n-riesco avatar Apr 21 '16 17:04 n-riesco

@takluyver How about for kernels that are remote? I know this is not officially supported by the notebook server, but it is something that we've experimented with and could be a requirement in certain deployments.

lbustelo avatar Apr 21 '16 18:04 lbustelo

@n-riesco - the idea in the proposal is that rather than having every kernel implement the capturing and signal/interrupt logic, we'd implement it once outside of the kernel and everyone automatically benefits. As for capturing output, that's opt-in for a kernel, so a kernel absolutely can do their own input/output instead of having the nanny handle it. The nanny makes it much easier to have this automatically taken care of.

jasongrout avatar Apr 21 '16 19:04 jasongrout

Another concern brought up in the meeting was the latency introduced in forwarding messages through the nanny. Can you mention that in the proposal? I thought @minrk said he might run some tests to get some idea about how much the latency on messages would be impacted by this proposal.

jasongrout avatar Apr 21 '16 19:04 jasongrout

@takluyver I'm sorry for suggesting a proposal split.

Here are 2 suggestions to the current proposal:

  • make the nanny an opt-in feature declared in the kernel spec (currently, IJavascript can run in the official docker images for Node.js; if the nanny was to be made compulsory, then the nanny (and all the dependencies) would have to be installed in the docker container).
  • consider extending the jupyter protocol with a message for the kernel to record log messages (so that frontends have access to them).

n-riesco avatar Apr 21 '16 22:04 n-riesco

@minrk - how heavy-weight do you see the nanny being? I imagined either a python file, or a lightweight OS-specific C program with zeromq as a dependency.

jasongrout avatar Apr 21 '16 22:04 jasongrout

@lbustelo The idea is that the nanny and the kernel are always running together on the same system. They may both be remote from the frontend (e.g. the notebook server), and this will work much like it already does - zmq messages sent over the network. One of the key advantages of this is that will allow interrupting remote kernels, which is currently impossible.

@n-riesco @jasongrout I definitely see the nanny as being a lightweight thing with few dependencies. In the first instance, it will likely be written in Python, because that's what we can write and debug most effectively, but I may later use it as an excuse to brush up on a language like Rust or Go, which will make it even lighter.

consider extending the jupyter protocol with a message for the kernel to record log messages (so that frontends have access to them).

The logging system is what you'll want to rely on to debug problems with the messaging, so I want it to be a) a separate mechanism, and b) as simple as possible, like 'open this file and write to it'. We can still arrange things so that the frontend can get the kernel logs by making the log file a named pipe, and having the frontend read it. I like this Unix-y approach here, because it provides a lot of flexibility while requiring very little complexity in the kernels.

takluyver avatar Apr 22 '16 13:04 takluyver

On 22/04/16 14:32, Thomas Kluyver wrote:

[...] We can still arrange things so that the frontend can get the kernel logs by making the log file a named pipe, and having the frontend read it.

How would that work in the case of remote kernels?

n-riesco avatar Apr 22 '16 16:04 n-riesco

The frontend could ask the nanny to set up a named pipe and forward the data over the network. But let's deal with that when we come to it.

takluyver avatar Apr 22 '16 17:04 takluyver

One glaring thing that worries from a purely ideological standpoint (and possibly a complexity standpoint): this changes the dynamic of a fairly simple kernel level messaging spec to a "kernel + nanny" level messaging spec.

rgbkrk avatar Apr 22 '16 19:04 rgbkrk

I don't think it's too big a change ideologically. The main effect is that there will be some extra messages to document between the frontend and the nanny, which the kernel won't directly send/receive.

takluyver avatar Apr 22 '16 19:04 takluyver

Can frontends still communicate with the kernel without using the nanny? I think that's where @n-riesco is curious (as am I), as the lead of the IJavaScript kernel and a Hydrogen dev.

rgbkrk avatar Apr 22 '16 23:04 rgbkrk

Yea, this needs to be optional. Now we are putting a requirement that the system that is hosting the kernels need to support Python.

lbustelo avatar Apr 23 '16 14:04 lbustelo

I think KernelNanny should not be optional. Adding it as an option only increases the complexity of everything, because capabilities are now fragmented across use cases.

minrk avatar Apr 25 '16 08:04 minrk

Arguably, it'd be simpler not to have a nanny at all, and let kernels handle themselves.

There's however a solution that would make things simple for everyone.

If a kernel nanny was implemented as an optional proxy (that interfaces with frontends in the usual manner), then no changes to the current frontends or kernels (or containers running these kernels) would be required, while it would also let kernels have a nanny if they really needed.

n-riesco avatar Apr 25 '16 09:04 n-riesco

As discussed, the functions of a nanny are not something kernels can readily handle themselves:

  • Capturing a process' own std streams at the OS level is fiddly on Unix, and we haven't even worked out how on Windows (if it's possible, it's likely to be something completely different than Unix). Capturing it from the parent process is trivial.
  • Interrupting blocked execution: even in languages that do lots of async stuff, unless you can guarantee that nothing will ever block execution, signals need to be sent from another thread or process.
  • Heartbeat: similarly, this should never be blocked by user code executing - though I know that it is in IRkernel, for instance, because I do not want to mess around with threads in R.

The simplest way to achieve all of these things is with a separate process. And I'd rather have one implementation of that that we control and standardise than 50 different ones with their own bugs. Providing this as part of our infrastructure means we're asking less of kernels.

I don't think requiring Python on the server is that big a hardship - most traditional servers will have it anyway, and it's an extra 30MB or so on Docker images, even if it's not already there. However, if it is a problem, we can look into rewriting the nanny in a lower level language.

takluyver avatar Apr 25 '16 09:04 takluyver

I should clarify what I meant about optional. I didn't mean that it should be impossible to run a kernel without a nanny ever, but that it should be a strict requirement for use with jupyter notebook.

There are two problems that the nanny solves:

  1. remote signaling, restarting, process management
  2. (optional) output capture

The first is a need of the application, the second is a need of the kernel. So some applications may require there to be a Nanny (e.g. the jupyter notebook application), while some kernels may require it (e.g. hypothetical kernels that rely on output capturing). This proposal requires no changes at all to kernels that don't opt-in to output capture, and applications would also have to opt-in to the nanny for the monitoring/signaling support. Hydrogen and nteract would still be able to talk to IJavascript without a nanny if they choose to, but Jupyter Notebook would not, and IJavascript would not need any awareness that the nanny is present or not.

minrk avatar Apr 25 '16 09:04 minrk

On 25/04/16 10:48, Min RK wrote:

Hydrogen and nteract would still be able to talk to IJavascript without a nanny, but Jupyter Notebook would not, and IJavascript would not need any awareness that the nanny is present or not.

I don't think that's what would happen. Currently, the standard to launch a kernel is its kernel spec.

If a nanny becomes the standard way to launch a kernel, nteract and Hydrogen will be forced to use the nanny, because there would be no other standard to launch the kernel.

n-riesco avatar Apr 25 '16 09:04 n-riesco

This would make one change, as I intend it: by deprecating the heartbeat as something kernels are required to support, applications using a remote kernel will need the nanny to detect kernels dying unexpectedly. Applications could still check for local kernels by monitoring the process. In practice, there are already kernels that don't properly implement the heartbeat (e.g. IRkernel).

@n-riesco the proposal is that the nanny uses the kernelspec, so it will still be there, and other things can still use it.

takluyver avatar Apr 25 '16 09:04 takluyver

This doesn't change kernelspecs, so that should be unaffected. However, if kernels begin to rely on the nanny features (e.g. new kernels not implementing output capture), you would need the nanny for it to work properly. I would certainly encourage all frontends to use the nanny, but they would not be forced.

minrk avatar Apr 25 '16 10:04 minrk