ipykernel icon indicating copy to clipboard operation
ipykernel copied to clipboard

IPKernelApp configs capture_fd_output and quiet conflict and break ipykernel

Open MrBago opened this issue 3 years ago • 1 comments

When c.IPKernelApp.capture_fd_output is True (default True) and c.IPKernelApp.quiet if False (default True), printing anything causes an infinite output in the cell. quiet = True causes outputs to get directed to fd, which then gets picked up by the capture_fd_output logic and re-printed. This loop continues until the process is killed.

MrBago avatar Feb 07 '22 17:02 MrBago

Adding context to link PRs together: the infinite output problem when IPKernelApp.quiet is False seems to have started in the initial commit of https://github.com/ipython/ipykernel/pull/630, and the capture_fd_output was introduced in #752.

jasongrout-db avatar Feb 07 '22 18:02 jasongrout-db

From my findings, when capture_fd_output=True and quiet=False, we echo the text back to sys._ _ stdout _ _ which was overwritten (in addition to sys.stdout) since we change the underlying file descriptor. Therefore we are echoing text by writing it back to the pipe instead of the terminal. So the watcher thread then reads that same text again in the pipe, writes to the terminal and then echos. reads, writes to terminal, echos.

Not that this is a solution but setting echo=None prevents the infinite loop. Although that might be the same as just setting quiet=True.

garlandz-db avatar Mar 25 '23 00:03 garlandz-db

One proposal for solution: in the case that quiet=False, capture=True, the watcher threads are only able to send zmq messages. That is, directly calling write() like sys.stdout.write() would actually echo the data onto the stdout pipe, which will then be picked up by the watcher thread and then write() to zmq but this time with echo disabled. image

appreciate any feedback!

garlandz-db avatar Apr 03 '23 18:04 garlandz-db

another solution alternative.

image

garlandz-db avatar Apr 06 '23 15:04 garlandz-db

Hi @Carreau or @minrk, do you mind taking a look at these proposed solutions?

Zsailer avatar Apr 06 '23 15:04 Zsailer

I haven't touched this file in a long time, and it's complicated enough that I don't think I can properly judge this.

In general I believe the watcher thread is complex, and a real fix that would fix this and other things, would be to really implement the kernel nanny.

The kernel nanny have a real way to actually capture output, and would allow to process signals, and metrics without having to worry about the kernel being stuck in a compiled library.

Carreau avatar Apr 12 '23 07:04 Carreau

How much effort would it take to implement the kernel nanny? This also seems non-trivial and would probably require a major release to ship.

garlandz-db avatar Apr 13 '23 15:04 garlandz-db

Another experiment; https://github.com/carreau/inplace_restarter, this intercept commands and when it see "%restart" just restart the subprocess.

It's not hard to implement, it's just the occasion to update the jupyter protocol and document it.

Technically we can implement it in a completely transparent manner, but I believe it would be nice to agree on a way to tell the kernel "you are under nanny" supervision, you do not need to do X, Y, Z. It would also be a good way to implement the handshake protocol, as the nany could do the handshake transparently for older kernels.

Carreau avatar Apr 17 '23 08:04 Carreau

@Carreau thanks for your insight! im not too sure how long the process to shipping new code in ipykernel takes. if you had to give an estimate, how long would the kernel nanny take to be shipped in ipykernel? it also seems determining the handshake pattern is a blocker then to implementing the nanny (using handshake protocol) since that JEP is still in discussion.

garlandz-db avatar Apr 20 '23 17:04 garlandz-db

The longest will be implementing it. Making a new release assuming purely nanny that does not change the protocol should not be long. We can even make it opt-in or in beta state. The hard thing is finding time to write the code.

Carreau avatar Apr 21 '23 08:04 Carreau

While I still think the nanny is a good idea (I ended up implementing a version of it in ipython parallel, though notably it does not include output capture), I think this specific self-looping issue can be solved in a much smaller way, since we already have a solution for that for the logger. We can do the same thing for the echo and solve this specific bug (i.e. echo FD should be _original_stdstream_copy, not __stdout__)

minrk avatar Apr 21 '23 09:04 minrk

@minrk just to double check my understanding, is this mental model correct that we are going from image

to this (aka move the arrow to the terminal output):

image

if so then thats a really great idea and simple to implement. was trying to come up with solutions to break the cycle in different ways 😅

garlandz-db avatar Apr 21 '23 18:04 garlandz-db

Yes, I think that's right. I believe #1111 fixes this, but I haven't quite got the tests to not mess with pytest's output capture.

minrk avatar Apr 24 '23 07:04 minrk

can confirm this fix works as expected. i copied your line change and the loop issue is gone. thanks for doing this @minrk!! lmk if there is something i can help with to drive this to completion

garlandz-db avatar Apr 26 '23 19:04 garlandz-db