nbclient icon indicating copy to clipboard operation
nbclient copied to clipboard

Errors after upgrading to 0.3.0

Open palewire opened this issue 5 years ago • 12 comments

Some users following a pattern similar to the one described in #48 are getting new errors after upgrading to 0.3. When they downgrade to 0.2, the errors are gone. Here's a sample traceback.

Traceback (most recent call last):
  File "update.py", line 154, in <module>
    cli()
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "update.py", line 85, in process
    _execute_notebook(
  File "update.py", line 29, in _execute_notebook
    client.execute()
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/nbclient/util.py", line 37, in wrapped
    result = loop.run_until_complete(coro(self, *args, **kwargs))
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 612, in run_until_complete
    return future.result()
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/nbclient/client.py", line 471, in async_execute
    async with self.async_setup_kernel(**kwargs):
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/async_generator/_util.py", line 34, in __aenter__
    return await self._agen.asend(None)
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/nbclient/client.py", line 440, in async_setup_kernel
    await self.async_start_new_kernel_client(**kwargs)
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/nbclient/client.py", line 393, in async_start_new_kernel_client
    await ensure_async(self.kc.start_channels())
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/jupyter_client/client.py", line 116, in start_channels
    self.hb_channel.start()
  File "/Users/apesce/.local/share/virtualenvs/coronavirus-tracker-dljAgG8F/lib/python3.8/site-packages/jupyter_client/asynchronous/client.py", line 97, in hb_channel
    loop = asyncio.new_event_loop()
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/events.py", line 758, in new_event_loop
    return get_event_loop_policy().new_event_loop()
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/events.py", line 656, in new_event_loop
    return self._loop_factory()
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/unix_events.py", line 54, in __init__
    super().__init__(selector)
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/selector_events.py", line 61, in __init__
    self._make_self_pipe()
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/selector_events.py", line 108, in _make_self_pipe
    self._ssock, self._csock = socket.socketpair()
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 571, in socketpair
    a, b = _socket.socketpair(family, type, proto)
OSError: [Errno 24] Too many open files
Exception ignored in: <function BaseEventLoop.__del__ at 0x1065fec10>
Traceback (most recent call last):
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/base_events.py", line 652, in __del__
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/unix_events.py", line 58, in close
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/selector_events.py", line 92, in close
  File "/usr/local/Cellar/[email protected]/3.8.1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/asyncio/selector_events.py", line 99, in _close_self_pipe
AttributeError: '_UnixSelectorEventLoop' object has no attribute '_ssock'
make[1]: *** [process] Error 1
make: *** [update] Error 2
(coronavirus-tracker) apesce@anthonys-mbp coronavirus-tracker % [IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.

palewire avatar May 17 '20 22:05 palewire

Hmm I wonder if this is related to some of the async stuff that @davidbrochart had worked on in the last few PRs?

choldgraf avatar May 17 '20 22:05 choldgraf

@palewire could you provide an example of a notebook or code snippet that reliably creates the problem?

choldgraf avatar May 17 '20 22:05 choldgraf

I am unable to replicate the bug, which we've only seen so far on Macbooks running Python 3.8 installed via Homebrew. I'm a neckbeard who uses Ubuntu 16.04, and I'm not seeing any bugs.

Here's basically what's happening in the code. We have this function to run notebooks via nbclient:

def _execute_notebook(name, path):
    """
    Private method to execute the provided notebook and handle errors.
    """
    input_path = f"{name}.ipynb"
    output_path = f"{name}-output.ipynb"
    with open(input_path) as f:
        nb = nbformat.read(f, as_version=4)
    client = NotebookClient(
        nb,
        timeout=600,
        kernel_name='python3',
        allow_errors=False,
        force_raise_errors=True,
        resources={'metadata': {'path': path}}
    )
    try:
        client.execute()
    except CellExecutionError:
        out = None
        msg = f'Error executing the notebook "{input_path}".\n\n'
        msg += f'See notebook "{input_path}" for the traceback.'
        print(msg)
        raise
    finally:
        with open(output_path, mode='w', encoding='utf-8') as f:
            nbformat.write(nb, f)

Then we feed notebook paths into the function from a list. It's something like this:

def run():
    print("Running notebooks")
    notebook_list = [
       "notebook-1",
       "notebook-2",
       "notebook-3",
       "notebook-4",
    ]
    for notebook in notebook_list:
        notebook_filename = f'./_notebooks/{notebook}'
        print(f"- {notebook_filename}.ipynb")
        _execute_notebook(
            notebook_filename,
            path='_notebooks/'
        )

palewire avatar May 18 '20 02:05 palewire

Hmm I'm unlikely to be able to help because I have an older ubuntu 18 and a new ubuntu 20 machine as well. Though maybe setting the ulimit lower and running might reproduce since OSX has a lot lower default limit. I can give it a try later in the week if someone else hasn't narrowed it down. Is the error being seen consistent or sporadic? Is it after a few notebooks have run or on the first one? Might be we're not cleaning up file handles somewhere in the dependency chain.

MSeal avatar May 18 '20 06:05 MSeal

As @MSeal suggested, I could reproduce the bug on Ubuntu 20.04 with python3.8 by setting ulimit -n 16, although it crashed in the shell channel creation (in ZMQ) instead of the heart beat channel (in asyncio.new_event_loop). But both are caused by a new socket creation, so increasing the ulimit might solve the issue, provided that we are not leaking socket allocation.

davidbrochart avatar May 18 '20 07:05 davidbrochart

One thing we could do is start running tests on an osx (or even windows) build on github actions. At least once we've got the bug reproducible in test form.

choldgraf avatar May 18 '20 14:05 choldgraf

Good point, sounds like a good idea in any case.

davidbrochart avatar May 18 '20 14:05 davidbrochart

This error was encountered by three different users on three different Macbooks, all running the same code.

palewire avatar May 18 '20 15:05 palewire

I am guessing a ulimit of 256, 512, or 1024 is more accurate for a given Mac. From reading up the default and max ulimits have shifted around in this range on Mac depending on the OS version and the available RAM.

MSeal avatar May 18 '20 16:05 MSeal

This is a recurrent issue, with e.g. multiple kernels in JLab.

SylvainCorlay avatar May 26 '20 14:05 SylvainCorlay

Likely related to the comments near the end of this thread: https://github.com/jupyter/jupyter_client/pull/548

MSeal avatar May 26 '20 17:05 MSeal

yep got another occurence in executablebooks/jupyter-book#867

chrisjsewell avatar Aug 11 '20 15:08 chrisjsewell