help
help copied to clipboard
Kernel didn't respond (randomly?)
I am running into a strange bug related to the sequential execution of Jupyter notebooks (Python 3 kernels). The main loop runs sequentially the following execution of a set of notebooks through nbconvert
[...]
from nbconvert.preprocessors import ExecutePreprocessor
[...]
class Report:
[..]
def execute_notebook(self, timeout=3600):
[...]
notebook = nbformat.read(str(self.notebook_path), as_version=4)
kernel_name = notebook["metadata"]["kernelspec"]["name"]
ep = ExecutePreprocessor(timeout=timeout, kernel_name=kernel_name)
ep.preprocess(notebook, dict(metadata=dict(path=self.notebook_folder)))
The executions are run on a daily basis. Some days, the loop over the notebooks works smoothly but other it fails when the program tries to execute the second notebook with
Traceback (most recent call last):
File "/home/data/ds-metrics/scripts/recurring_reports.py", line 52, in main
report.execute_notebook()
File "/home/data/miniconda3/envs/analytics/lib/python3.6/site-packages/report/__init__.py", line 49, in execute_notebook
ep.preprocess(notebook, dict(metadata=dict(path=self.notebook_folder)))
File "/home/data/miniconda3/envs/analytics/lib/python3.6/site-packages/nbconvert/preprocessors/execute.py", line 359, in preprocess
with self.setup_preprocessor(nb, resources, km=km):
File "/home/data/miniconda3/envs/analytics/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/data/miniconda3/envs/analytics/lib/python3.6/site-packages/nbconvert/preprocessors/execute.py", line 304, in setup_preprocessor
self.km, self.kc = self.start_new_kernel(cwd=path)
File "/home/data/miniconda3/envs/analytics/lib/python3.6/site-packages/nbconvert/preprocessors/execute.py", line 258, in start_new_kernel
kc.wait_for_ready(timeout=self.startup_timeout)
File "/home/data/miniconda3/envs/analytics/lib/python3.6/site-packages/jupyter_client/blocking/client.py", line 124, in wait_for_ready
raise RuntimeError("Kernel didn't respond in %d seconds" % timeout)
RuntimeError: Kernel didn't respond in 60 seconds
After the failures, if I run the loop again or if I run the execution one by one, it works. It “looks” really random and I do not know how to reproduce the problem.
The executions are run on a Debian server from a conda
virtual environment with Python 3.6.6 and the following list of relevant packages
ipykernel 5.1.0 py36h24bf2e0_1001 conda-forge
ipython 7.1.1 py36h24bf2e0_1000 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
jupyter 1.0.0 py_1 conda-forge
jupyter_client 5.2.3 py_1 conda-forge
jupyter_console 6.0.0 py_0 conda-forge
jupyter_core 4.4.0 py_0 conda-forge
nbconvert 5.4.0 1 conda-forge
nbformat 4.4.0 py_1 conda-forge
notebook 5.7.2 py36_1000 conda-forge
pexpect 4.6.0 py36_1000 conda-forge
python 3.6.6 h5001a0f_3 conda-forge
pyzmq 17.1.2 py36hae99301_1 conda-forge
traitlets 4.3.2 py36_1000 conda-forge
zeromq 4.2.5 hfc679d8_6 conda-forge
Is there a way to log more information during the execution so that, the next time it fails, I could have clues on the behaviour?
Thank you very much for your help!
Hi,
it turns out that the problem seems to come from a call to /dev/random
on the server I am using.
The following example test.py
from nbconvert.preprocessors import ExecutePreprocessor
ep = ExecutePreprocessor(kernel_name="python3")
km, kc = ep.start_new_kernel()
km.shutdown_kernel()
can get stuck on (using strace python test.py
)
open("/dev/random", O_RDONLY) = 6
poll([{fd=6, events=POLLIN}], 1, -1
while the running script goes on as
open("/dev/random", O_RDONLY) = 6
poll([{fd=6, events=POLLIN}], 1, -1) = 1 ([{fd=6, revents=POLLIN}])
close(6) = 0
open("/dev/urandom", O_RDONLY) = 6
This is a similar issue: https://github.com/ipython/ipykernel/issues/342