ipykernel
ipykernel copied to clipboard
Replace Tornado with AnyIO
References
Supersedes #876. Closes #656.
IPykernel should not depend on a web server. It uses Tornado as an async framework, but modern Python has better solutions, including standard asyncio. Tornado's use in ipykernel involves using a lot of queues and creates complexity that makes maintenance or improvements to the code base harder.
Structured concurrency as introduced by Trio is gaining a lot of traction. AnyIO brings structured concurrency on top of asyncio and acts as an abstraction layer which allows to choose a concurrency backend, such as asyncio or trio. Major projects have switched to AnyIO, including FastAPI.
This PR is still a work in progress. I have disabled some tests, like the ones for in-process kernels. Also, I think we should re-think event-loop integration (maybe using Trio's approach). But I would like to get feedback from the community and discuss how we can move forward with these changes.
cc @SylvainCorlay @JohanMabille @minrk @ccordoba12
Code changes
The dependency to Tornado is dropped, in favor of AnyIO. This means that instead of using ZMQStream with on_recv callbacks, we use async/await everywhere.
Using AnyIO in the main thread event loop gives us Trio support for free in the user code. It is currently disabled until we get AnyIO support in pyzmq, or decide to process shell messages in a separate thread, which might be needed for sub-shells anyway.
User-facing changes
The goal is to have no user-facing change. However, I have identified the following ones:
- handling of standard streams (
stdout,stderr) is improved as there is no delay anymore. - interrupting async user code now works (see #881).
- traceback when interrupting blocking user code leaks one frame into the kernel source code (might be fixable).
Backwards-incompatible changes
- The stream traits (e.g. shell_stream) are replaced with their socket equivalent (e.g.
shell_socket). - The API is now mostly
async, including theKernel.startmethod. - There is a new blocking
Kernel.stop(thread-safe) method.
+1 on AnyIO, they seem to have all the missing primitives needed for doing asyncio+threaded work, seems like a solid project.
As a trio user I'm +:100: for using AnyIO :heart:
I must point out that using blocking portals in this manner will block the original event loop, preventing any other tasks from running until the blocking call is complete. What should also be taken into account is any Protocols and tasks launched from the nested loop, which will also stop functioning once the blocking call completes.
As a former Java programmer, I recall that Java used a similar approach to solve this problem, but there the nested event loop was able to process events from the parent loop.
Thanks for the feedback @agronholm. Actually this PR doesn't use blocking portals. Instead, I'm running a new event loop in a separate thread (for the control thread). So there is no nested loops, just two separate (parallel) loops, each in their own thread. There shouldn't be any issue with that, right? Except if there is a global state in AnyIO's event loop, which I doubt.
I'm running a new event loop in a separate thread (for the control thread). So there is no nested loops, just two separate (parallel) loops, each in their own thread. There shouldn't be any issue with that, right?
Yeah, sorry, it's a rather large and complex PR so I haven't read through it, so I just made assumptions based on your other posts. So are you creating one-off event loops for running async callables from sync functions? Or do you have a lingering, shared event loop thread running in the background? Either way, thread safety is the biggest problem. One of the main selling points of async is that you can reliably determine where you can mutate shared objects safely (without explicit locks) since all other code is suspended until the next yield point. Those guarantees go out the window when two simultaneously running event loops share memory.
Except if there is a global state in AnyIO's event loop, which I doubt.
Technically there is no such thing as an AnyIO event loop :smile:
Or do you have a lingering, shared event loop thread running in the background?
Yes, the control thread runs for the whole duration of the kernel process, so we're not creating lots of event loops. The control channel running in its own thread was already there before this PR, and we're taking care of thread safety. We need parallelism as the control channel should work even when executing user blocking code. At least this PR should not break things with regards to that.
Just a quick note to say that this seems significant enough to mark it for IPykernel 7.
Yes, I agree. It also requires pyzmq v25.0 and jupyter-client v8.0, so that will be a good occasion.
I'd like to merge this PR and update downstream projects in other PRs. Any feedback?
Let's move forward with this. I'll merge soon if there are no objections.
I think at least the local tests should all be passing before merging.
I think at least the local tests should all be passing before merging.
There is a bit of a chicken and egg situation though with respect to the downstream projects that are tested in the CI.
There is a bit of a chicken and egg situation though with respect to the downstream projects that are tested in the CI.
No I mean explicitly the unit tests in ipykernel
Thanks so much for working on this PR Steve!
Okay, unit tests are passing (except Windows). I think we should wait for #1210 and #1212 to land and be released, before considering main to be ipykernel 7. In the mean time, I'll write some migration docs here and try to address this portion:
"traceback when interrupting blocking user code leaks one frame into the kernel source code".
The fact that async interrupt isn't working on Windows, and we are failing on interrupt-related tests on the jupyter_client downstream test, leads me to think there is something not quite working about the new interrupt logic (the section in do_execute with execution.interrupt = await to_thread.run_sync(self.shell_interrupt.get). I don't know enough about AnyIO yet to debug properly.
Yes, the AnyIO documentation says that "Windows does not natively support signals so do not rely on this in a cross platform application". This is about signal handlers in general, not even related to AnyIO, so I don't think we can have async cell interrupt support on Windows. But in current ipykernel this is not even possible, on any platform, right? So this is still an improvement.
No, it did work previously, this test was not skipped on Windows: https://github.com/ipython/ipykernel/blob/eddd3e666a82ebec287168b0da7cfa03639a3772/tests/test_async.py#L32. Additionally, the Jupyter Client test needs to pass before we merge, since it is treating the kernel as a process, and nothing we do here should break that contract. Also note that this downstream test is running on Ubuntu.
I haven't looked at this test, but with a fresh install of JupyterLab I cannot interrupt a top-level await on Linux:
https://github.com/ipython/ipykernel/assets/4711805/9331be71-2a0d-4f2c-8d4c-5509e03446e1
The CancelledError arrives after the cell has finished executing.
Indeed, the test only tests that there is a CancelledError in the reply, but it doesn't test that it arrives before the cell has finished executing.
Great, we can skip that test on Windows, and concentrate on getting client to pass.
Aha! The issue is that we need to update SignalTestKernel in jupyter_client, which is directly subclassing BaseKernel. I'd like to be able to update that class to prove it it still working, but I can't yet figure out why the interrupt is not working the same way.
More progress, I see now that we have to handle self.kernel.shell_interrupt.get now in SignalTestKernel.do_execute.
FWIW, I don't think we can use anyio on anything other than asyncio until they support polling on FDs, and pyzmq will continue to require tornado to run on asyncio on Windows Proactor for the foreseeable future, until/unless someone figures out how to poll on edge-triggered FDs for these event loops that have decided to remove basic functionality on Windows.
FWIW, I don't think we can use anyio on anything other than asyncio until they support polling on FDs,
Are you sure you linked to the correct issue? I'm having some trouble understanding the connection. What is lacking FD polling?
Are you sure you linked to the correct issue
ha, not at all, sorry. Fixed link: https://github.com/agronholm/anyio/issues/386
That issue has a simple workaround: calling socket.fromfd(). What do you need that can't use that?
That issue has a simple workaround: calling socket.fromfd(). What do you need that can't use that?
Unfortunately, that doesn't help here. The socket in question cannot be read from, it can only be polled for readability. It is an edge-triggered FD, used for signaling events to process on zmq sockets. trio's wait_readable works for this using IOCTL_AFD_POLL. You can also see https://github.com/python-trio/trio/issues/52 for loads of background.
I'm afraid I don't know how to proceed. It seems as if the changes from #1210 are incompatible with the changes in this PR. I am getting deadlocked on reading the control loop.
What's really confusing to me is why nest_asyncio would show up in the stack trace:
async with asyncio.Lock():
File "/Users/silvester/workspace/.venvs/ipykernel/lib/python3.10/site-packages/pluggy/_manager.py", line 80, in _hookexec
return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
File "/Users/silvester/workspace/.venvs/ipykernel/lib/python3.10/site-packages/pluggy/_callers.py", line 39, in _multicall
res = hook_impl.function(*args)
File "/Users/silvester/workspace/.venvs/ipykernel/lib/python3.10/site-packages/_pytest/runner.py", line 179, in pytest_runtest_teardown
item.session._setupstate.teardown_exact(nextitem)
File "/Users/silvester/workspace/.venvs/ipykernel/lib/python3.10/site-packages/_pytest/runner.py", line 522, in teardown_exact
fin()
File "/Users/silvester/workspace/.venvs/ipykernel/lib/python3.10/site-packages/_pytest/fixtures.py", line 677, in <lambda>
subrequest.node.addfinalizer(lambda: fixturedef.finish(request=subrequest))
File "/Users/silvester/workspace/.venvs/ipykernel/lib/python3.10/site-packages/_pytest/fixtures.py", line 1022, in finish
func()
File "/Users/silvester/workspace/.venvs/ipykernel/lib/python3.10/site-packages/_pytest/fixtures.py", line 909, in _teardown_yield_fixture
next(it)
File "/Users/silvester/workspace/.venvs/ipykernel/lib/python3.10/site-packages/anyio/pytest_plugin.py", line 75, in wrapper
yield from runner.run_asyncgen_fixture(func, kwargs)
File "/Users/silvester/workspace/.venvs/ipykernel/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1940, in run_asyncgen_fixture
self.get_loop().run_until_complete(
File "/Users/silvester/workspace/.venvs/ipykernel/lib/python3.10/site-packages/nest_asyncio.py", line 84, in run_until_complete
self._run_once()
File "/Users/silvester/workspace/.venvs/ipykernel/lib/python3.10/site-packages/nest_asyncio.py", line 107, in _run_once
event_list = self._selector.select(timeout)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/selectors.py", line 562, in select
kev_list = self._selector.control(None, max_ev, timeout)
+++++++++++++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++++++++++++