ipykernel
ipykernel copied to clipboard
IPKernelApp configs capture_fd_output and quiet conflict and break ipykernel
When c.IPKernelApp.capture_fd_output is True (default True) and c.IPKernelApp.quiet if False (default True), printing anything causes an infinite output in the cell. quiet = True causes outputs to get directed to fd, which then gets picked up by the capture_fd_output logic and re-printed. This loop continues until the process is killed.
Adding context to link PRs together: the infinite output problem when IPKernelApp.quiet is False seems to have started in the initial commit of https://github.com/ipython/ipykernel/pull/630, and the capture_fd_output was introduced in #752.
From my findings, when capture_fd_output=True and quiet=False, we echo the text back to sys._ _ stdout _ _ which was overwritten (in addition to sys.stdout) since we change the underlying file descriptor. Therefore we are echoing text by writing it back to the pipe instead of the terminal. So the watcher thread then reads that same text again in the pipe, writes to the terminal and then echos. reads, writes to terminal, echos.
Not that this is a solution but setting echo=None prevents the infinite loop. Although that might be the same as just setting quiet=True.
One proposal for solution: in the case that quiet=False, capture=True, the watcher threads are only able to send zmq messages. That is, directly calling write() like sys.stdout.write() would actually echo the data onto the stdout pipe, which will then be picked up by the watcher thread and then write() to zmq but this time with echo disabled.

appreciate any feedback!
another solution alternative.
Hi @Carreau or @minrk, do you mind taking a look at these proposed solutions?
I haven't touched this file in a long time, and it's complicated enough that I don't think I can properly judge this.
In general I believe the watcher thread is complex, and a real fix that would fix this and other things, would be to really implement the kernel nanny.
The kernel nanny have a real way to actually capture output, and would allow to process signals, and metrics without having to worry about the kernel being stuck in a compiled library.
How much effort would it take to implement the kernel nanny? This also seems non-trivial and would probably require a major release to ship.
For context:
Another experiment; https://github.com/carreau/inplace_restarter, this intercept commands and when it see "%restart" just restart the subprocess.
It's not hard to implement, it's just the occasion to update the jupyter protocol and document it.
Technically we can implement it in a completely transparent manner, but I believe it would be nice to agree on a way to tell the kernel "you are under nanny" supervision, you do not need to do X, Y, Z. It would also be a good way to implement the handshake protocol, as the nany could do the handshake transparently for older kernels.
@Carreau thanks for your insight! im not too sure how long the process to shipping new code in ipykernel takes. if you had to give an estimate, how long would the kernel nanny take to be shipped in ipykernel? it also seems determining the handshake pattern is a blocker then to implementing the nanny (using handshake protocol) since that JEP is still in discussion.
The longest will be implementing it. Making a new release assuming purely nanny that does not change the protocol should not be long. We can even make it opt-in or in beta state. The hard thing is finding time to write the code.
While I still think the nanny is a good idea (I ended up implementing a version of it in ipython parallel, though notably it does not include output capture), I think this specific self-looping issue can be solved in a much smaller way, since we already have a solution for that for the logger. We can do the same thing for the echo and solve this specific bug (i.e. echo FD should be _original_stdstream_copy, not __stdout__)
@minrk just to double check my understanding, is this mental model correct that we are going from

to this (aka move the arrow to the terminal output):
if so then thats a really great idea and simple to implement. was trying to come up with solutions to break the cycle in different ways 😅
Yes, I think that's right. I believe #1111 fixes this, but I haven't quite got the tests to not mess with pytest's output capture.
can confirm this fix works as expected. i copied your line change and the loop issue is gone. thanks for doing this @minrk!! lmk if there is something i can help with to drive this to completion