tornado icon indicating copy to clipboard operation
tornado copied to clipboard

Assertion errors causing exceptions that kill Jupyter kernels

Open stepsbystep opened this issue 2 years ago • 3 comments

I find I am getting an error that shuts Jupyter running a Python3 kernel down when using plotly, solara, or ipyleaflet to display maps on a Windows 10 machine. In the last case, a marker can be moved around and the position data updated a few times and then the exception is thrown. I checked and there are no memory constraints. I have not otherwise had any problems with Jupyter. I updated Jupyter and all my packages but this did not help. Checking the Jupyter log, I found that there is a comment failure sequence that involves tornado at the start and end of the traceback:

[E 16:21:15.377 NotebookApp] Exception in callback functools.partial(<function ZMQStream._update_handler.<locals>.<lambda> at 0x0000029A73E555E0>)
    Traceback (most recent call last):
      File "C:\Users\howno\anaconda3\lib\site-packages\tornado\ioloop.py", line 738, in _run_callback
        ret = callback()
      File "C:\Users\howno\anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 718, in <lambda>
        self.io_loop.add_callback(lambda: self._handle_events(self.socket, 0))
      File "C:\Users\howno\anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 634, in _handle_events
        self._handle_recv()
      File "C:\Users\howno\anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 663, in _handle_recv
        self._run_callback(callback, msg)
      File "C:\Users\howno\anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 584, in _run_callback
        f = callback(*args, **kwargs)
      File "C:\Users\howno\anaconda3\lib\site-packages\zmq\eventloop\zmqstream.py", line 308, in stream_callback
        return callback(self, msg)
      File "C:\Users\howno\anaconda3\lib\site-packages\notebook\services\kernels\handlers.py", line 547, in _on_zmq_reply
        super()._on_zmq_reply(stream, msg)
      File "C:\Users\howno\anaconda3\lib\site-packages\notebook\base\zmqhandlers.py", line 251, in _on_zmq_reply
        self.write_message(msg, binary=isinstance(msg, bytes))
      File "C:\Users\howno\anaconda3\lib\site-packages\tornado\websocket.py", line 334, in write_message
        return self.ws_connection.write_message(message, binary=binary)
      File "C:\Users\howno\anaconda3\lib\site-packages\tornado\websocket.py", line 1081, in write_message
        fut = self._write_frame(True, opcode, message, flags=flags)
      File "C:\Users\howno\anaconda3\lib\site-packages\tornado\websocket.py", line 1056, in _write_frame
        return self.stream.write(frame)
      File "C:\Users\howno\anaconda3\lib\site-packages\tornado\iostream.py", line 539, in write
        self._handle_write()
      File "C:\Users\howno\anaconda3\lib\site-packages\tornado\iostream.py", line 965, in _handle_write
        self._write_buffer.advance(num_bytes)
      File "C:\Users\howno\anaconda3\lib\site-packages\tornado\iostream.py", line 182, in advance
        assert 0 < size <= self._size
    AssertionError

stepsbystep avatar Jun 15 '23 21:06 stepsbystep

The only time I've seen this assertion fire was in #2871 where it was caused by incorrect usage of threads. You must not call any Tornado methods except IOLoop.add_callback from any thread other than the event loop thread. If you're sure you're not doing anything with threads, then I'd need more information about what exactly you are doing to provide any guidance here.

bdarnell avatar Jun 16 '23 01:06 bdarnell

I've run into the same issue. There's a thread here that might give more info: https://discourse.jupyter.org/t/jupyter-notebook-zmq-message-arrived-on-closed-channel-error/17869/5

wrunn avatar Jun 20 '23 18:06 wrunn

OK, so it looks like the canonical issue for this on the jupyter side is https://github.com/jupyter/notebook/issues/6721. As of this writing, upgrading the ipyflow package and/or downgrading jupyter_client are reported to fix the problem.

My understanding is that the jupyter project has kind of gotten themselves stuck in a strange place. Originally they used Tornado and they called the IOLoop reentrantly (e.g. calling IOLoop.run_sync from within a callback running on the IOLoop thread). This is surprising to me and has never been supported, but apparently it worked for the original Tornado IOLoop. Then Tornado adopted asyncio, which has stricter protections against reentrant use of the IOLoop. Jupyter responded by adopting nest_asyncio, which does some hacks to bypass those protections. This has its own compatibility problems as python and asyncio have evolved, so Jupyter recently moved away from nest_asyncio to start using a separate thread. This is the correct long-term solution IMHO, but introducing threading is tricky and bugs like this may occur.

As far as I can tell there's nothing Tornado could change to fix this; it's going to have to come from the Jupyter side. I'm going to leave this issue open to direct people to https://github.com/jupyter/notebook/issues/6721

bdarnell avatar Jun 21 '23 01:06 bdarnell

I'm going to close this since it sounds like things have been improved on the jupyter side (although jupyter/notebook#6721 is still open?) It sounds like the fix is now to move forward to jupyter_core >= 5.3.2 instead of pinning old versions.

bdarnell avatar Jun 13 '24 14:06 bdarnell