voila
voila copied to clipboard
Preheated kernels cause the server to crash if pool_size is too large
Description
When launching voila
with preheated kernels enabled and a pool_size=2
in a directory with 4 notebooks, the server crashes with an inscrutable error. The voila
server runs successfully when pool_size=1
.
Reproduce
Edit voila.json
as follows
{
"VoilaConfiguration": {
"preheat_kernel": true
},
"VoilaKernelManager": {
"preheat_blacklist": [
],
"kernel_pools_config": {
"default": {
"pool_size": 2
}
},
"fill_delay": 0
}
}
Create a directory dash/
with 4 notebooks.
Run voila
as follows:
voila --port=8080 --no-browser --Voila.ip=0.0.0.0 --show_tracebacks=True dash/
The server will crash after a few seconds.
Expected behavior
Either:
- The server runs successfully and creates 8 kernels.
- The server crashes with a clear error message on the issue (e.g. maybe we can only start 1 kernel per core?)
Context
- voila version: 0.3.0
- Operating System and version: Arch Linux 5.18.14-arch1-1
- Browser and version: N/A
Troubleshoot Output
$PATH: /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/bin /home/user1/.pyenv/versions/3.9.11/bin /home/user1/.pyenv/libexec /home/user1/.pyenv/plugins/python-build/bin /home/user1/.pyenv/plugins/pyenv-virtualenv/bin /home/user1/.pyenv/plugins/pyenv-update/bin /home/user1/.pyenv/plugins/pyenv-installer/bin /home/user1/.pyenv/plugins/pyenv-doctor/bin /home/user1/.pyenv/shims /home/user1/.pyenv/bin /home/user1/.poetry/bin /home/user1/google-cloud-sdk/bin /usr/local/bin /usr/binsys.path: /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/bin /home/user1/.pyenv/versions/3.9.11/lib/python39.zip /home/user1/.pyenv/versions/3.9.11/lib/python3.9 /home/user1/.pyenv/versions/3.9.11/lib/python3.9/lib-dynload /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages /home/user1/code/my-voila-project
sys.executable: /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/bin/python
sys.version: 3.9.11 (main, Apr 12 2022, 18:23:35) [GCC 11.2.0]
platform.platform(): Linux-5.18.14-arch1-1-x86_64-with-glibc2.35
which -a jupyter: /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/bin/jupyter
pip list: Package Version ----------------------------- --------- aiohttp 3.8.1 aiosignal 1.2.0 ansiwrap 0.8.4 anyio 3.4.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 async-timeout 4.0.2 attrs 21.2.0 Babel 2.9.1 backcall 0.2.0 black 22.1.0 bleach 4.1.0 cachetools 4.2.4 certifi 2021.10.8 cffi 1.15.0 charset-normalizer 2.0.9 click 8.0.3 cycler 0.11.0 debugpy 1.5.1 decorator 5.1.0 defusedxml 0.7.1 entrypoints 0.3 flake8 4.0.1 fonttools 4.28.3 frozenlist 1.2.0 google-api-core 2.3.0 google-auth 2.3.3 google-auth-oauthlib 0.4.6 google-cloud-bigquery 2.31.0 google-cloud-bigquery-storage 2.10.1 google-cloud-core 2.2.1 google-cloud-storage 1.43.0 google-crc32c 1.3.0 google-resumable-media 2.1.0 googleapis-common-protos 1.54.0 grpcio 1.42.0 grpcio-status 1.42.0 idna 3.3 ipykernel 6.6.0 ipython 7.30.1 ipython-genutils 0.2.0 ipywidgets 7.6.5 jedi 0.18.1 Jinja2 3.0.3 joblib 1.1.0 json5 0.9.6 jsonschema 4.2.1 jupyter-client 7.1.0 jupyter-core 4.9.1 jupyter-server 1.13.1 jupyterlab 3.2.5 jupyterlab-pygments 0.1.2 jupyterlab-server 2.9.0 jupyterlab-widgets 1.0.2 jupytext 1.13.4 kiwisolver 1.3.2 libcst 0.3.23 markdown-it-py 1.1.0 MarkupSafe 2.0.1 matplotlib 3.5.1 matplotlib-inline 0.1.3 mccabe 0.6.1 mdit-py-plugins 0.3.0 mistune 0.8.4 multidict 5.2.0 mypy-extensions 0.4.3 nbclassic 0.3.4 nbclient 0.5.9 nbconvert 6.3.0 nbformat 5.1.3 nest-asyncio 1.5.4 notebook 6.4.6 numpy 1.21.4 oauthlib 3.1.1 packaging 21.3 pandas 1.3.5 pandas-gbq 0.15.0 pandocfilters 1.5.0 papermill 2.3.3 parso 0.8.3 pathspec 0.9.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 8.4.0 pip 22.0.3 platformdirs 2.4.0 prometheus-client 0.12.0 prompt-toolkit 3.0.24 proto-plus 1.19.8 protobuf 3.19.1 ptyprocess 0.7.0 pyarrow 5.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycodestyle 2.8.0 pycparser 2.21 pydata-google-auth 1.3.0 pyflakes 2.4.0 Pygments 2.10.0 pyparsing 3.0.6 pyrsistent 0.18.0 python-dateutil 2.8.2 pytz 2021.3 PyYAML 6.0 pyzmq 22.3.0 requests 2.26.0 requests-oauthlib 1.3.0 rsa 4.8 scikit-learn 1.0.1 scipy 1.7.3 seaborn 0.11.2 Send2Trash 1.8.0 setuptools 60.6.0 setuptools-scm 6.3.2 six 1.16.0 sniffio 1.2.0 tenacity 8.0.1 terminado 0.12.1 testpath 0.5.0 textwrap3 0.9.2 threadpoolctl 3.0.0 toml 0.10.2 tomli 1.2.2 tornado 6.1 tqdm 4.62.3 traitlets 5.1.1 typing_extensions 4.0.1 typing-inspect 0.7.1 urllib3 1.26.7 my-voila-project 0.1.0 voila 0.3.0 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 1.2.3 websockets 10.1 wheel 0.37.1 widgetsnbextension 3.5.2 yarl 1.7.2
Command Line Output
[Voila] Using /tmp to store connection files [Voila] Storing connection files in /tmp/voila_7ryoha1v. [Voila] Serving static files from /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/voila/static. [Voila] Voilà is running at: http://user1-laptop:8080/ [Voila] Kernel started: 8c01c683-766f-4f1a-a15f-23944a1bb72f [Voila] Kernel started: 73cda5ff-1c6b-4dd5-a7a5-c8bcefb40fba [Voila] Kernel started: a6a404c2-13de-4201-a58b-11214fe06f01 [Voila] Kernel started: 348cdf75-df23-4f4b-8828-17dd8b51a2bc [Voila] Kernel started: e958e8e4-5938-4ca9-b6c7-41373f660eec [Voila] Kernel started: b4a0bfd3-6325-44b0-ba94-453e67e0a89e [Voila] Kernel started: 9ef47d30-370f-4401-9eb4-55ba20f81624 [Voila] Kernel started: e0c3d4b2-beff-4ab4-99ef-e6282b0d46f6 [Voila] Kernel pool of abc_dash.ipynb is filled with 2 kernel(s) [Voila] Kernel pool of def_dash.ipynb is filled with 2 kernel(s) [Voila] Kernel pool of ghi_dash.ipynb is filled with 2 kernel(s) [Voila] Kernel pool of jkl_dash.ipynb is filled with 2 kernel(s) [Voila] Kernel shutdown: a6a404c2-13de-4201-a58b-11214fe06f01 [Voila] Kernel shutdown: 73cda5ff-1c6b-4dd5-a7a5-c8bcefb40fba [Voila] Kernel shutdown: b4a0bfd3-6325-44b0-ba94-453e67e0a89e [Voila] Kernel shutdown: 9ef47d30-370f-4401-9eb4-55ba20f81624 [Voila] Kernel shutdown: e0c3d4b2-beff-4ab4-99ef-e6282b0d46f6 [Voila] Kernel shutdown: 8c01c683-766f-4f1a-a15f-23944a1bb72f [Voila] Kernel shutdown: 348cdf75-df23-4f4b-8828-17dd8b51a2bc [Voila] Kernel shutdown: e958e8e4-5938-4ca9-b6c7-41373f660eec Traceback (most recent call last): File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/bin/voila", line 8, insys.exit(main()) File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/traitlets/config/application.py", line 846, in launch_instance app.start() File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/voila/app.py", line 548, in start self.listen() File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/voila/app.py", line 596, in listen self.ioloop.start() File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 199, in start self.asyncio_loop.run_forever() File "/home/user1/.pyenv/versions/3.9.11/lib/python3.9/asyncio/base_events.py", line 601, in run_forever self._run_once() File "/home/user1/.pyenv/versions/3.9.11/lib/python3.9/asyncio/base_events.py", line 1890, in _run_once handle = self._ready.popleft() IndexError: pop from an empty deque
Hi, I can not reproduce it on my machine, can you track the memory usage when you start Voila
? Does it relate to the specs of the machine?
Hi @trungleduc , thank you for the quick response!
I am happy to provide any debugging info that you think would be useful. I just did a rudimentary test by watching htop
while I started the voila
server, and I did not see any spike in memory usage that could explain the issue. My laptop also has 8 cores, so it doesn't seem like a n_kernels > n_cores issue either. For reference, the CLI crashes within about 5 seconds of starting, so it's not obviously a resource utilization issue.
Can you think of any other info I can provide that would be useful?
Your error is IndexError: pop from an empty deque
, it is likely related to this issue: https://github.com/jupyterlab/jupyterlab/issues/11934 The proper fix is likely somewhere else, but the underlying issue is that nest_asyncio
has a race condition if it gets patched in while there are events queued for execution on the asyncio loop.
Thank you so much for commenting @vidartf !
I was able to apply the patch you proposed here to the voila source code here, and I am no longer seeing this crash. I greatly appreciate the work-around.
Is there a downside to merging this fix into voila
?
@steve-marmalade I want to try the fix but can not reproduce the issue. Can you provide a minimal notebook to reproduce it?
@trungleduc Since this is a race-condition it can be pretty tricky to reproduce it. I haven't been able to see any direct correlation between notebook content and this behavior, but here are suspected things that might make it easier to reproduce as per my struggles in the lab issue:
- use tornado 6.2. While the issue still appears on 6.1, it is easier to reproduce and identify issues on 6.2.
- Modify the code in and around the
IndexError: pop from an empty deque
in the stdlib's asyncio code. Ideally the code there should never be called if thenest_asyncio
patch is applied as early as possible, but this issues occurs when there are tasks queued on an unpatched loop once thepatch
gets called. In that case there is a race while the original unpatched loop processes its queue, so the more tasks are on it whenpatch
gets called, the more likely it is to trigger.