enhancement-proposals
enhancement-proposals copied to clipboard
Kernel nanny proposal
As discussed at the dev meeting. There are a few TODOs which have not yet been decided. We can bikeshed about them, or whoever gets to implementing the relevant bits first can try their preferred option ;-).
Pinging @JanSchulz, who was interested in this for IRkernel logging.
Not sure if I understand the details, but currently the notebook isn't very good at shutting down the R kernel on windows, because the R kernel is not a single process, but more like R.exe -> cmd -> rterm.exe [see https://github.com/jupyter/jupyter_client/issues/104]. I'm not sure if the "nanny" can detect such a thing without a heartbeat?
[Such things might happen even for python kernels, if you use a batch file with activate <env> & python <kernel startup line>, which is needed to get the correct PATH in a kernel...]
I suspect it won't make much of a difference either way in that situation. Both currently and with the kernel nanny, it will send shutdown_request to ask the kernel to shut itself down, and if it doesn't shut down within some time period, it will terminate it more forcefully. I'd guess that second bit is where it goes wrong, since it only knows about the top-level process that it started.
Besides fiddling with the time we wait for the kernel to shut itself down, I'm not sure what we could do to improve that.
Thanks for writing this up, @takluyver!
@takluyver Nicely designed and clearly written. I made a drawing by hand of the frontends, nanny, kernel, and channels; let me know if you would like a copy. 😄
Thanks all!
@willingc, yes, it would be good to see your drawing, to check if the explanation conveyed what I was thinking clearly.
@takluyver Here's the link to the drawing's folder on Dropbox: https://www.dropbox.com/sh/kzc9bom60c9e57x/AAAWcdlGo8RZB9cklEv7jC2ua?dl=0
Thanks, that looks good.
@takluyver Great. You detailed things out very clearly :key:
I would suggest that this proposal is split into four proposals:
- a proposal to replace SIGINT.
- a proposal for kernels to capture the low level stderr and stdout streams and forward them to the frontend.
- a proposal to introduce the command
jupyter kernel --kernel x - a proposal (specific to IPython) that will use a nanny process to implement the first two proposals.
a proposal (specific to IPython) that will use a nanny process to implement the first two proposals.
This is absolutely not specific to IPython - the proposal is for the nanny process to be used for all kernels.
Your 1 & 2 don't really make sense without a nanny process. 3 is doable, but it's a more incidental benefit. I don't see the benefit of splitting this up into smaller pieces: it's one change to the architecture that lets us do a number of useful things, which seems like exactly the right scope for a JEP.
This was also discussed at the in-person dev meeting, and while I don't want to suggest that it's closed for discussion, we did spend some time hashing out what we wanted, and I'd really hope that the remaining issues to work out are details, not the fundamental nature of the proposal.
On 21/04/16 17:54, Thomas Kluyver wrote:
Your 1 & 2 don't really make sense without a nanny process.
I think the kernel is in a better position to handle 1 and specially 2 than an agnostic nanny process:
- a kernel, if really needed, can implement its own nanny process to handle 1 and 2
- the nanny process cannot determine the origin of stdout and stderr without the kernel's help
- a kernel can always capture the low level stdout/stderr
@takluyver How about for kernels that are remote? I know this is not officially supported by the notebook server, but it is something that we've experimented with and could be a requirement in certain deployments.
@n-riesco - the idea in the proposal is that rather than having every kernel implement the capturing and signal/interrupt logic, we'd implement it once outside of the kernel and everyone automatically benefits. As for capturing output, that's opt-in for a kernel, so a kernel absolutely can do their own input/output instead of having the nanny handle it. The nanny makes it much easier to have this automatically taken care of.
Another concern brought up in the meeting was the latency introduced in forwarding messages through the nanny. Can you mention that in the proposal? I thought @minrk said he might run some tests to get some idea about how much the latency on messages would be impacted by this proposal.
@takluyver I'm sorry for suggesting a proposal split.
Here are 2 suggestions to the current proposal:
- make the nanny an opt-in feature declared in the kernel spec (currently, IJavascript can run in the official docker images for Node.js; if the nanny was to be made compulsory, then the nanny (and all the dependencies) would have to be installed in the docker container).
- consider extending the jupyter protocol with a message for the kernel to record log messages (so that frontends have access to them).
@minrk - how heavy-weight do you see the nanny being? I imagined either a python file, or a lightweight OS-specific C program with zeromq as a dependency.
@lbustelo The idea is that the nanny and the kernel are always running together on the same system. They may both be remote from the frontend (e.g. the notebook server), and this will work much like it already does - zmq messages sent over the network. One of the key advantages of this is that will allow interrupting remote kernels, which is currently impossible.
@n-riesco @jasongrout I definitely see the nanny as being a lightweight thing with few dependencies. In the first instance, it will likely be written in Python, because that's what we can write and debug most effectively, but I may later use it as an excuse to brush up on a language like Rust or Go, which will make it even lighter.
consider extending the jupyter protocol with a message for the kernel to record log messages (so that frontends have access to them).
The logging system is what you'll want to rely on to debug problems with the messaging, so I want it to be a) a separate mechanism, and b) as simple as possible, like 'open this file and write to it'. We can still arrange things so that the frontend can get the kernel logs by making the log file a named pipe, and having the frontend read it. I like this Unix-y approach here, because it provides a lot of flexibility while requiring very little complexity in the kernels.
On 22/04/16 14:32, Thomas Kluyver wrote:
[...] We can still arrange things so that the frontend can get the kernel logs by making the log file a named pipe, and having the frontend read it.
How would that work in the case of remote kernels?
The frontend could ask the nanny to set up a named pipe and forward the data over the network. But let's deal with that when we come to it.
One glaring thing that worries from a purely ideological standpoint (and possibly a complexity standpoint): this changes the dynamic of a fairly simple kernel level messaging spec to a "kernel + nanny" level messaging spec.
I don't think it's too big a change ideologically. The main effect is that there will be some extra messages to document between the frontend and the nanny, which the kernel won't directly send/receive.
Can frontends still communicate with the kernel without using the nanny? I think that's where @n-riesco is curious (as am I), as the lead of the IJavaScript kernel and a Hydrogen dev.
Yea, this needs to be optional. Now we are putting a requirement that the system that is hosting the kernels need to support Python.
I think KernelNanny should not be optional. Adding it as an option only increases the complexity of everything, because capabilities are now fragmented across use cases.
Arguably, it'd be simpler not to have a nanny at all, and let kernels handle themselves.
There's however a solution that would make things simple for everyone.
If a kernel nanny was implemented as an optional proxy (that interfaces with frontends in the usual manner), then no changes to the current frontends or kernels (or containers running these kernels) would be required, while it would also let kernels have a nanny if they really needed.
As discussed, the functions of a nanny are not something kernels can readily handle themselves:
- Capturing a process' own std streams at the OS level is fiddly on Unix, and we haven't even worked out how on Windows (if it's possible, it's likely to be something completely different than Unix). Capturing it from the parent process is trivial.
- Interrupting blocked execution: even in languages that do lots of async stuff, unless you can guarantee that nothing will ever block execution, signals need to be sent from another thread or process.
- Heartbeat: similarly, this should never be blocked by user code executing - though I know that it is in IRkernel, for instance, because I do not want to mess around with threads in R.
The simplest way to achieve all of these things is with a separate process. And I'd rather have one implementation of that that we control and standardise than 50 different ones with their own bugs. Providing this as part of our infrastructure means we're asking less of kernels.
I don't think requiring Python on the server is that big a hardship - most traditional servers will have it anyway, and it's an extra 30MB or so on Docker images, even if it's not already there. However, if it is a problem, we can look into rewriting the nanny in a lower level language.
I should clarify what I meant about optional. I didn't mean that it should be impossible to run a kernel without a nanny ever, but that it should be a strict requirement for use with jupyter notebook.
There are two problems that the nanny solves:
- remote signaling, restarting, process management
- (optional) output capture
The first is a need of the application, the second is a need of the kernel. So some applications may require there to be a Nanny (e.g. the jupyter notebook application), while some kernels may require it (e.g. hypothetical kernels that rely on output capturing). This proposal requires no changes at all to kernels that don't opt-in to output capture, and applications would also have to opt-in to the nanny for the monitoring/signaling support. Hydrogen and nteract would still be able to talk to IJavascript without a nanny if they choose to, but Jupyter Notebook would not, and IJavascript would not need any awareness that the nanny is present or not.
On 25/04/16 10:48, Min RK wrote:
Hydrogen and nteract would still be able to talk to IJavascript without a nanny, but Jupyter Notebook would not, and IJavascript would not need any awareness that the nanny is present or not.
I don't think that's what would happen. Currently, the standard to launch a kernel is its kernel spec.
If a nanny becomes the standard way to launch a kernel, nteract and Hydrogen will be forced to use the nanny, because there would be no other standard to launch the kernel.
This would make one change, as I intend it: by deprecating the heartbeat as something kernels are required to support, applications using a remote kernel will need the nanny to detect kernels dying unexpectedly. Applications could still check for local kernels by monitoring the process. In practice, there are already kernels that don't properly implement the heartbeat (e.g. IRkernel).
@n-riesco the proposal is that the nanny uses the kernelspec, so it will still be there, and other things can still use it.
This doesn't change kernelspecs, so that should be unaffected. However, if kernels begin to rely on the nanny features (e.g. new kernels not implementing output capture), you would need the nanny for it to work properly. I would certainly encourage all frontends to use the nanny, but they would not be forced.