jupyter_client icon indicating copy to clipboard operation
jupyter_client copied to clipboard

Runtime error raised by kernel dying in wait_for_ready

Open willingc opened this issue 8 years ago • 32 comments

I've seen this error with conda builds on RTD several times. The latest was a recent build of the notebook docs.

Documenting here if not related to random kernel deaths and more relevant here (or nbconvert) than notebook issues.

Error occurs:

  • after converting several Sphinx documents
  • when preprocessing a notebook file with nbconvert
  • in jupyter_client wait_for_ready

Relevant lines from Traceback

nbconvert/preprocessors/execute.py", line 141, in preprocess
jupyter_client/manager.py", line 433, in start_new_kernel
kc.wait_for_ready(timeout=startup_timeout)
jupyter_client/blocking/client.py", line 59, in wait_for_ready
    raise RuntimeError('Kernel died before replying to kernel_info')
RuntimeError: Kernel died before replying to kernel_info

Full Traceback

reading sources... [ 22%] examples/Notebook/Connecting with the Qt Console

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/sphinx/cmdline.py", line 244, in main
    app.build(opts.force_all, filenames)
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/sphinx/application.py", line 267, in build
    self.builder.build_update()
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/sphinx/builders/__init__.py", line 251, in build_update
    'out of date' % len(to_build))
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/sphinx/builders/__init__.py", line 265, in build
    self.doctreedir, self.app))
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/sphinx/environment.py", line 547, in update
    self._read_serial(docnames, app)
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/sphinx/environment.py", line 567, in _read_serial
    self.read_doc(docname, app)
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/sphinx/environment.py", line 720, in read_doc
    pub.publish()
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/docutils/core.py", line 217, in publish
    self.settings)
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/sphinx/io.py", line 46, in read
    self.parse()
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/docutils/readers/__init__.py", line 78, in parse
    self.parser.parse(self.input, document)
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/nbsphinx.py", line 406, in parse
    rststring, resources = exporter.from_notebook_node(nb, resources)
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/nbsphinx.py", line 360, in from_notebook_node
    nb, resources = pp.preprocess(nb, resources)
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 141, in preprocess
    cwd=path)
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/jupyter_client/manager.py", line 433, in start_new_kernel
    kc.wait_for_ready(timeout=startup_timeout)
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/jupyter_client/blocking/client.py", line 59, in wait_for_ready
    raise RuntimeError('Kernel died before replying to kernel_info')
RuntimeError: Kernel died before replying to kernel_info

Exception occurred:
  File "/home/docs/checkouts/readthedocs.org/user_builds/jupyter-notebook/conda/latest/lib/python3.5/site-packages/jupyter_client/blocking/client.py", line 59, in wait_for_ready
    raise RuntimeError('Kernel died before replying to kernel_info')
RuntimeError: Kernel died before replying to kernel_info
The full traceback has been saved in /tmp/sphinx-err-u6hd8ohp.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!

willingc avatar Apr 09 '16 02:04 willingc

Can you see what the jupyter_client and nbconvert versions are?

minrk avatar Apr 09 '16 05:04 minrk

It looks as if nbconvert is 4.2.0 from the build logs. I'm not seeing a version for jupyter_client. [Note: If I create a local conda env with the environment.yml, nbconvert is 4.2.0 and jupyter_client is 4.2.2.]

Collecting nbconvert (from nbsphinx)
  Downloading nbconvert-4.2.0-py2.py3-none-any.whl (319kB)

Also, nbsphinx-0.2.5

willingc avatar Apr 09 '16 06:04 willingc

Thanks, I'll see if I can track it down. I really thought I had fixed this one.

minrk avatar Apr 09 '16 20:04 minrk

It's an intermittent failure and minutes later when run again it didn't error with the same docs build. It's likely a timing issue. If I see it again on RTD builds, I'll let you know. Travis builds may offer clues.

willingc avatar Apr 09 '16 20:04 willingc

Yeah, the bug's definitely a timing issue, I just need to take another careful look at where the timing can get messed up.

minrk avatar Apr 09 '16 20:04 minrk

Any update on this? I'm seeing this fail reliably when using runipy in a Docker container, which seems to work fine locally with the exact same image.

Any workaround I could try?

michael-erasmus avatar Jun 01 '16 23:06 michael-erasmus

(@michael-erasmus I ran into the same problem combining docker and ipython, and fixed it by stuffing my ipython invocation in sh -c like docker run ... sh -c "cmd arg ...", as per https://github.com/ipython/ipython/issues/7062#issuecomment-223809024. More detail at https://github.com/jupyter-attic/docker-notebook/pull/6.)

jdanbrown avatar Jul 12 '16 01:07 jdanbrown

@jdanbrown thank you! that worked (when I got around to testing it)

michael-erasmus avatar Sep 07 '16 01:09 michael-erasmus

Is there any resolution for this issue yet? I also intermittently encounter this error in the CI builds for my python application that uses nbconvert.exporters.HTMLExporter to programmatically render a jupyter notebook to HTML.

desilinguist avatar Jul 11 '17 18:07 desilinguist

Hi @desilinguist,

I thought that the latest stable versions of jupyter_client and nbconvert had resolved this issue. However, as you are still seeing intermittent issues, there may be a timing issue particularly if it is happening in the CI builds. Can you share what versions you are running? Also if you have a CI error log that would be helpful too. Thanks!

  • nbconvert 5.2.1 is the latest stable version
    • This release did allow ExecutePreprocessor.iopub_timeout : Int to be configurable by the user. The default is 4s for timeout. You could try configuring it to be larger and see if that helps when running CI too.
  • jupyter_client 5.1 was released a few weeks ago.

cc/ @minrk @mpacer

willingc avatar Jul 11 '17 19:07 willingc

It looks like I'm still using versions older than those. I'll update and see if I still see the issue. Thanks for the reply. On Tue, Jul 11, 2017 at 3:15 PM Carol Willing [email protected] wrote:

Hi @desilinguist https://github.com/desilinguist,

I thought that the latest stable versions of jupyter_client and nbconvert had resolved this issue. However, as you are still seeing intermittent issues, there may be a timing issue particularly if it is happening in the CI builds. Can you share what versions you are running? Also if you have a CI error log that would be helpful too. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jupyter/jupyter_client/issues/154#issuecomment-314543988, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJ3kP7AGL4aa_NEcOBcBZizAAuDBDMEks5sM8nNgaJpZM4IDgsm .

desilinguist avatar Jul 11 '17 21:07 desilinguist

We are still seeing this issue with Pweave https://github.com/mpastell/Pweave/issues/61 using jupyter_client 5.1 https://github.com/mpastell/Pweave/blob/master/pweave/processors/jupyter.py

mpastell avatar Aug 21 '17 10:08 mpastell

I am still seeing this issue in my CI builds on CircleCI and I am now using the latest versions of nbconvert (v5.3.1) and jupyter_client (v5.1.0).

@willingc any further suggestions/help you have will be much appreciated!

desilinguist avatar Jan 22 '18 18:01 desilinguist

I have a similar issue, here's my setup:

OS: macOS High Sierra 10.13.6 Python version: Python 3.6.3 :: Anaconda custom (64-bit)

Python packages:

nbconvert                                       5.3.1
jupyter-client                                  5.2.3
jupyterlab                                      0.32.1

Starting Jupyter Lab in one terminal:

$ jupyter lab --port=8888 --ip=0.0.0.0 --no-browser --allow-root --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.kernel_manager_class=extipy.ExternalIPythonKernelManager

[W 01:26:48.642 LabApp] All authentication is disabled.  Anyone who can connect to this server will be able to run code.
[I 01:26:48.724 LabApp] Loading IPython parallel extension
[I 01:26:48.738 LabApp] JupyterLab beta preview extension loaded from /Users/andrewssobral/anaconda3/lib/python3.6/site-packages/jupyterlab
[I 01:26:48.738 LabApp] JupyterLab application directory is /Users/andrewssobral/anaconda3/share/jupyter/lab
[I 01:26:48.829 LabApp] Serving notebooks from local directory: /Users/andrewssobral/
[I 01:26:48.829 LabApp] 0 active kernels
[I 01:26:48.829 LabApp] The Jupyter Notebook is running at:
[I 01:26:48.829 LabApp] http://localhost:8888/
[I 01:26:48.829 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Running the jupyter client in a second terminal:

import jupyter_client as jc
from jupyter_client import KernelManager
from jupyter_client.kernelspec import KernelSpecManager, NoSuchKernel, NATIVE_KERNEL_NAME
from jupyter_client.manager import start_new_kernel
from IPython.utils.capture import capture_output

#km, kc = start_new_kernel(kernel_name=NATIVE_KERNEL_NAME)
km = KernelManager(kernel_name='python3')
kc = km.client()
info = {
  "base_url": "/",
  "hostname": "127.0.0.1",
  "notebook_dir": "/root",
  "password": False,
  "pid": 1,
  "port": 8888,
  "secure": False,
  "token": "",
  "url": "http://127.0.0.1:8888/"
}
kc.load_connection_info(info)
kc.start_channels()
try:
  kc.wait_for_ready()
except RuntimeError:
  kc.stop_channels()
  raise

def run_cell(client, code, timeout=15):
  # now we can run code.  This is done on the shell channel
  #shell = client.shell_channel
  print("\nrunning: ", code)

  # execution is immediate and async, returning a UUID
  #msg_id = client.execute_interactive(code)
  # get_msg can block for a reply
  #reply = shell.get_msg()
  reply = client.execute_interactive(code, timeout=timeout)

  status = reply['content']['status']
  if status == 'ok':
    print('succeeded!')
  elif status == 'error':
    print('failed!')
    for line in reply['content']['traceback']:
      print(line)

run_cell(kc, "result='Hello world!'")
run_cell(kc, "print(result)")

output of the python script:

Traceback (most recent call last):
  File "client_juplab.py", line 28, in <module>
    kc.wait_for_ready()
  File "/Users/andrewssobral/anaconda3/lib/python3.6/site-packages/jupyter_client/blocking/client.py", line 120, in wait_for_ready
    raise RuntimeError('Kernel died before replying to kernel_info')
RuntimeError: Kernel died before replying to kernel_info

However, if I stop the local Jupyter Lab and do:

from jupyter_client.kernelspec import NATIVE_KERNEL_NAME
from jupyter_client.manager import start_new_kernel

km, kc = start_new_kernel(kernel_name=NATIVE_KERNEL_NAME)

def run_cell(client, code, timeout=15):
  print("\nrunning: ", code)
  reply = client.execute_interactive(code, timeout=timeout)
  status = reply['content']['status']
  if status == 'ok':
    print('succeeded!')
  elif status == 'error':
    print('failed!')
    for line in reply['content']['traceback']:
      print(line)

run_cell(kc, "result='Hello world!'")
run_cell(kc, "print(result)")

kc.shutdown()

It works fine, and the output is:

running:  result='Hello world!'
succeeded!

running:  print(result)
Hello world!
succeeded!

I'm doing something wrong? thanks in advance

andrewssobral avatar Sep 22 '18 23:09 andrewssobral

I am also seeing this problem occur intermittently in my GitLab Runner CI environment. FWIW, I saw this occur on a job that consisted of running 3 notebooks via nbconvert and saving the executed notebooks to ipynb. The first two notebooks execute just fine. Then I see this problem on the third notebook.

Versions:

ipython==7.3.0
ipython-genutils==0.2.0
jupyter==1.0.0
jupyter-client==5.2.3
jupyter-console==6.0.0
jupyter-contrib-core==0.3.3
jupyter-contrib-nbextensions==0.5.1
jupyter-core==4.4.0
jupyter-highlight-selected-word==0.2.0
jupyter-latex-envs==1.4.6
jupyter-nbextensions-configurator==0.4.1
nb-pdf-template==2.0.3
nbconvert==5.4.1
nbformat==4.4.0
widgetsnbextension==3.2.1

The error message:

Downloading artifacts from coordinator... ok        id=970660 responseStatus=200 OK token=t3wy_P-n
$ export PATH=$PATH:/opt/teradata/client/15.10/bin:/app/local/anaconda3/bin:/app/texlive/2018/bin/x86_64-linux
$ source activate BAM_py35
$ jupyter nbconvert --ExecutePreprocessor.timeout=-1 --to notebook --execute $NOTEBOOK_DIR/The_First_Notebook.ipynb
[NbConvertApp] WARNING | Config option `template_path` not recognized by `NotebookExporter`.
[NbConvertApp] Converting notebook notebooks/my_subdir/The_First_Notebook.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python3
[NbConvertApp] Writing 13918 bytes to notebooks/my_subdir/The_First_Notebook.nbconvert.ipynb
$ jupyter nbconvert --ExecutePreprocessor.timeout=-1 --to notebook --execute $NOTEBOOK_DIR/The_Second_Notebook.ipynb
[NbConvertApp] WARNING | Config option `template_path` not recognized by `NotebookExporter`.
[NbConvertApp] Converting notebook notebooks/my_subdir/The_Second_Notebook.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python3
[NbConvertApp] Writing 353750 bytes to notebooks/my_subdir/The_Second_Notebook.nbconvert.ipynb
$ jupyter nbconvert --ExecutePreprocessor.timeout=-1 --to notebook --execute $NOTEBOOK_DIR/The_Third_Notebook.ipynb
[NbConvertApp] WARNING | Config option `template_path` not recognized by `NotebookExporter`.
[NbConvertApp] Converting notebook notebooks/my_subdir/The_Third_Notebook.ipynb to notebook
Traceback (most recent call last):
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/traitlets/config/application.py", line 657, in launch_instance
    app.initialize(argv)
  File "</app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/decorator.py:decorator-gen-124>", line 2, in initialize
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/traitlets/config/application.py", line 87, in catch_config_error
    return method(app, *args, **kwargs)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 467, in initialize
    self.init_sockets()
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 260, in init_sockets
    self.init_iopub(context)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 265, in init_iopub
    self.iopub_port = self._bind_socket(self.iopub_socket, self.iopub_port)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 181, in _bind_socket
    s.bind("tcp://%s:%i" % (self.ip, port))
  File "zmq/backend/cython/socket.pyx", line 547, in zmq.backend.cython.socket.Socket.bind
  File "zmq/backend/cython/checkrc.pxd", line 25, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Address already in use
Traceback (most recent call last):
  File "/app/local/anaconda3/envs/myenv_py35/bin/jupyter-nbconvert", line 11, in <module>
    sys.exit(main())
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/jupyter_core/application.py", line 266, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/nbconvertapp.py", line 337, in start
    self.convert_notebooks()
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/nbconvertapp.py", line 507, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/nbconvertapp.py", line 478, in convert_single_notebook
    output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/nbconvertapp.py", line 407, in export_single_notebook
    output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/exporters/exporter.py", line 178, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/exporters/exporter.py", line 196, in from_file
    return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/exporters/notebook.py", line 32, in from_notebook_node
    nb_copy, resources = super(NotebookExporter, self).from_notebook_node(nb, resources, **kw)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/exporters/exporter.py", line 138, in from_notebook_node
    nb_copy, resources = self._preprocess(nb_copy, resources)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/exporters/exporter.py", line 315, in _preprocess
    nbc, resc = preprocessor(nbc, resc)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/preprocessors/base.py", line 47, in __call__
    return self.preprocess(nb, resources)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 352, in preprocess
    with self.setup_preprocessor(nb, resources, km=km):
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/contextlib.py", line 59, in __enter__
    return next(self.gen)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 297, in setup_preprocessor
    self.km, self.kc = self.start_new_kernel(cwd=path)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 251, in start_new_kernel
    kc.wait_for_ready(timeout=self.startup_timeout)
  File "/app/local/anaconda3/envs/myenv_py35/lib/python3.5/site-packages/jupyter_client/blocking/client.py", line 120, in wait_for_ready
    raise RuntimeError('Kernel died before replying to kernel_info')
RuntimeError: Kernel died before replying to kernel_info
ERROR: Job failed: exit status 1

tpanza avatar Apr 09 '19 01:04 tpanza

File "zmq/backend/cython/socket.pyx", line 547, in zmq.backend.cython.socket.Socket.bind File "zmq/backend/cython/checkrc.pxd", line 25, in zmq.backend.cython.checkrc._check_rc zmq.error.ZMQError: Address already in use

@tpanza you may wish to double check your ZMQ installation (pyzmq and zeromq) as it looks like cython is conflicting.

willingc avatar Apr 09 '19 05:04 willingc

I suspect this is a case of the classic timing issue inherent in Jupyter kernel startup. Ports are determined and closed, connection file built, kernel launched, and by the time the ports are used by the kernel, at least one is in use by something else. This window can be exacerbated if the kernel startup time takes a bit (e.g., Spark or some larger "carrier" of the kernel) and/or the system is shared by other applications.

kevin-bates avatar Apr 09 '19 18:04 kevin-bates

I am getting the same issue. I my code I want to read multiple files. but getting error: [NbConvertApp] Executing notebook with kernel: python3 [NbConvertApp] ERROR | Kernel died while waiting for execute reply. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/execute.py", line 471, in _wait_for_reply msg = self.kc.shell_channel.get_msg(timeout=timeout_interval) File "/usr/local/lib/python3.6/dist-packages/jupyter_client/blocking/channels.py", line 57, in get_msg raise Empty queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/jupyter-nbconvert", line 11, in sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/jupyter_core/application.py", line 267, in launch_instance return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs) File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 658, in launch_instance app.start() File "/usr/local/lib/python3.6/dist-packages/nbconvert/nbconvertapp.py", line 338, in start self.convert_notebooks() File "/usr/local/lib/python3.6/dist-packages/nbconvert/nbconvertapp.py", line 508, in convert_notebooks self.convert_single_notebook(notebook_filename) File "/usr/local/lib/python3.6/dist-packages/nbconvert/nbconvertapp.py", line 479, in convert_single_notebook output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer) File "/usr/local/lib/python3.6/dist-packages/nbconvert/nbconvertapp.py", line 408, in export_single_notebook output, resources = self.exporter.from_filename(notebook_filename, resources=resources) File "/usr/local/lib/python3.6/dist-packages/nbconvert/exporters/exporter.py", line 179, in from_filename return self.from_file(f, resources=resources, **kw) File "/usr/local/lib/python3.6/dist-packages/nbconvert/exporters/exporter.py", line 197, in from_file return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw) File "/usr/local/lib/python3.6/dist-packages/nbconvert/exporters/html.py", line 90, in from_notebook_node return super(HTMLExporter, self).from_notebook_node(nb, resources, **kw) File "/usr/local/lib/python3.6/dist-packages/nbconvert/exporters/templateexporter.py", line 299, in from_notebook_node nb_copy, resources = super(TemplateExporter, self).from_notebook_node(nb, resources, **kw) File "/usr/local/lib/python3.6/dist-packages/nbconvert/exporters/exporter.py", line 139, in from_notebook_node nb_copy, resources = self._preprocess(nb_copy, resources) File "/usr/local/lib/python3.6/dist-packages/nbconvert/exporters/exporter.py", line 316, in _preprocess nbc, resc = preprocessor(nbc, resc) File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/base.py", line 47, in call return self.preprocess(nb, resources) File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/execute.py", line 381, in preprocess nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources) File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/base.py", line 69, in preprocess nb.cells[index], resources = self.preprocess_cell(cell, resources, index) File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/execute.py", line 414, in preprocess_cell reply, outputs = self.run_cell(cell, cell_index) File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/execute.py", line 491, in run_cell exec_reply = self._wait_for_reply(parent_msg_id, cell) File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/execute.py", line 473, in _wait_for_reply self._check_alive() File "/usr/local/lib/python3.6/dist-packages/nbconvert/preprocessors/execute.py", line 456, in _check_alive raise DeadKernelError("Kernel died") nbconvert.preprocessors.execute.DeadKernelError: Kernel died

Khushbu04in avatar Aug 07 '19 16:08 Khushbu04in

I am getting the same issue when I run notebooks with pytest. This problem does not occur every time I run, but it occurs frequently.

nbconvert version = 5.6.0 jupyter_client version = 5.3.3

The error:


self = <jupyter_client.blocking.client.BlockingKernelClient object at 0x7f9e103e7588> timeout = 60

def wait_for_ready(self, timeout=None):
    """Waits for a response when a client is blocked

    - Sets future time for timeout
    - Blocks on shell channel until a message is received
    - Exit if the kernel has died
    - If client times out before receiving a message from the kernel, send RuntimeError
    - Flush the IOPub channel
    """
    if timeout is None:
        abs_timeout = float('inf')
    else:
        abs_timeout = time.time() + timeout

    from ..manager import KernelManager
    if not isinstance(self.parent, KernelManager):
        # This Client was not created by a KernelManager,
        # so wait for kernel to become responsive to heartbeats
        # before checking for kernel_info reply
        while not self.is_alive():
            if time.time() > abs_timeout:
                raise RuntimeError("Kernel didn't respond to heartbeats in %d seconds and timed out" % timeout)
            time.sleep(0.2)

    # Wait for kernel info reply on shell channel
    while True:
        try:
            msg = self.shell_channel.get_msg(block=True, timeout=1)
        except Empty:
            pass
        else:
            if msg['msg_type'] == 'kernel_info_reply':
                self._handle_kernel_info_reply(msg)
                break

        if not self.is_alive():
           raise RuntimeError('Kernel died before replying to kernel_info')
           RuntimeError: Kernel died before replying to kernel_info

/usr/local/lib/python3.5/dist-packages/jupyter_client/blocking/client.py:120: RuntimeError

snmhas avatar Sep 27 '19 12:09 snmhas

We were getting this error on papermill when we tried running the tests in a virtual environment using venv. The only thing that solved it was installing ipykernel then running python -m ipykernel install --user before executing a notebook

ammarasmro avatar Jan 30 '20 15:01 ammarasmro

Probably need more information on what other applications are running when seeing these issues. The ipykernel install helping the issue is surprising to me, as both the venv should already have done this during install of ipython and isolating the kernel shouldn't be directly helping with socket race conditions on the machine.

Is there anything else running when these exceptions are raised? Also are you performing operations in parallel with jupyter_client? I do occasionally see socket issues when running jupyter_client in a threaded environment in python 3.5 but never when it's being used in isolation without other contenders for the ZMQ sockets.

MSeal avatar Feb 04 '20 07:02 MSeal

We were getting this error on papermill when we tried running a tests in a AWS Lambda

uvlavi avatar Feb 04 '20 10:02 uvlavi

@uvlavi Is the tests running multiple papermill calls at once within the lambda? Are you near lambda resource or time constraint limits?

MSeal avatar Feb 04 '20 18:02 MSeal

This is the setup that proceeds the failure

python -m venv v_env
. $VIRTUAL_ENV_ACTIVATE_PATH
pip install wheel
pip install -e .

any call to nbconvert & papermill was triggering this error

Adding this at the end fixed it

python -m ipykernel install --user

This is all run in a docker container on CircleCI

ammarasmro avatar Feb 04 '20 20:02 ammarasmro

So pip install -e . doesn't install the ipython kernel, since it's not a requirement of jupyter_client. If you did pip install -e .[test] or pip install ipykernel it would provide the particular kernel you are using. Kernel installation is independent of the kernel manager or client code here, so it would make sense it would fail. But that being said I would expect the error message to be that no kernel was found instead of RuntimeError: Kernel died before replying to kernel_info.

MSeal avatar Feb 04 '20 22:02 MSeal

We have also been running into this error on and off while running our tests that use the HTMLExporter of nbconvert via PyTest, it is not consistently reproducible.

nbconvert==5.6.0 jupyter-client==5.3.1

vatsan avatar Feb 21 '20 23:02 vatsan

Hi all! I came here via Google and I'm not 100% sure the issue I'm tracking down is related to this one, but looks close, at least.

I'm trying to understand why some builds (example) at Read the Docs that are using nbsphinx end with:

nbsphinx.NotebookError: DeadKernelError in examples/Using NEOS package.md:
Kernel died

Notebook error:
DeadKernelError in examples/Using NEOS package.md:
Kernel died

There are cases where the exact commit hash works (the build succeed) and others that it fails. I read here that you were talking about a "timing issue" but I'm not sure to understand what that means. Does it mean that it's waiting for a ipython kernel that was never started and it fails due timeout?

In any case, is there anything we can do from Read the Docs side to have a workaround? (I'm not even sure if this a bug in our side or in jupypter-client, nbsphinx, or other package? :sweat_smile:

I'll keep taking a look at this and I'll share here whatever I find, tho.

humitos avatar Jan 11 '21 15:01 humitos

We are seeing a similar issue, it can be somewhat easily reproduced by starting a lot of jupyter clients in parallel:

for i in {1..100}
do
    papermill --no-progress-bar /path/to/my/notebook.ipynb /path/to/output/nb.$i.ipynb &
done

You'll see some pairings of zmq.error.ZMQError: Address already in use and RuntimeError: Kernel died before replying to kernel_info.

mlucool avatar Nov 05 '21 20:11 mlucool

I believe I understand this now. This code finds free ports and then writes the set of ports to the connection file so that a kernel knows where to join. This is a race condition as the ports can be used up before the kernel tries to bind them. I'm not sure of a good way to fix this.

mlucool avatar Nov 10 '21 23:11 mlucool

Hi @mlucool - you're right, this is a race condition. Ideally, the kernel would create/bind to the ports and communicate back to the launching application (jupyter_client), which is what the kernel handshaking pattern proposes. Another possibility would be to write a kernel provisioner that does what's necessary (kernel handshake?) since the connection info now comes from there. Moving port management kernel-side seems like a good goal.

kevin-bates avatar Nov 11 '21 04:11 kevin-bates