voila icon indicating copy to clipboard operation
voila copied to clipboard

Preheated kernels cause the server to crash if pool_size is too large

Open steve-marmalade opened this issue 1 year ago • 6 comments

Description

When launching voila with preheated kernels enabled and a pool_size=2 in a directory with 4 notebooks, the server crashes with an inscrutable error. The voila server runs successfully when pool_size=1.

Reproduce

Edit voila.json as follows

{
   "VoilaConfiguration": {
      "preheat_kernel": true
   },
   "VoilaKernelManager": {
      "preheat_blacklist": [
      ],
      "kernel_pools_config": {
         "default": {
            "pool_size": 2
         }
      },
      "fill_delay": 0
   }
}

Create a directory dash/ with 4 notebooks.

Run voila as follows:

voila --port=8080 --no-browser --Voila.ip=0.0.0.0 --show_tracebacks=True dash/

The server will crash after a few seconds.

Expected behavior

Either:

  • The server runs successfully and creates 8 kernels.
  • The server crashes with a clear error message on the issue (e.g. maybe we can only start 1 kernel per core?)

Context

  • voila version: 0.3.0
  • Operating System and version: Arch Linux 5.18.14-arch1-1
  • Browser and version: N/A
Troubleshoot Output
$PATH:
	/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/bin
	/home/user1/.pyenv/versions/3.9.11/bin
	/home/user1/.pyenv/libexec
	/home/user1/.pyenv/plugins/python-build/bin
	/home/user1/.pyenv/plugins/pyenv-virtualenv/bin
	/home/user1/.pyenv/plugins/pyenv-update/bin
	/home/user1/.pyenv/plugins/pyenv-installer/bin
	/home/user1/.pyenv/plugins/pyenv-doctor/bin
	/home/user1/.pyenv/shims
	/home/user1/.pyenv/bin
	/home/user1/.poetry/bin
	/home/user1/google-cloud-sdk/bin
	/usr/local/bin
	/usr/bin

sys.path: /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/bin /home/user1/.pyenv/versions/3.9.11/lib/python39.zip /home/user1/.pyenv/versions/3.9.11/lib/python3.9 /home/user1/.pyenv/versions/3.9.11/lib/python3.9/lib-dynload /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages /home/user1/code/my-voila-project

sys.executable: /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/bin/python

sys.version: 3.9.11 (main, Apr 12 2022, 18:23:35) [GCC 11.2.0]

platform.platform(): Linux-5.18.14-arch1-1-x86_64-with-glibc2.35

which -a jupyter: /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/bin/jupyter

pip list: Package Version ----------------------------- --------- aiohttp 3.8.1 aiosignal 1.2.0 ansiwrap 0.8.4 anyio 3.4.0 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 async-timeout 4.0.2 attrs 21.2.0 Babel 2.9.1 backcall 0.2.0 black 22.1.0 bleach 4.1.0 cachetools 4.2.4 certifi 2021.10.8 cffi 1.15.0 charset-normalizer 2.0.9 click 8.0.3 cycler 0.11.0 debugpy 1.5.1 decorator 5.1.0 defusedxml 0.7.1 entrypoints 0.3 flake8 4.0.1 fonttools 4.28.3 frozenlist 1.2.0 google-api-core 2.3.0 google-auth 2.3.3 google-auth-oauthlib 0.4.6 google-cloud-bigquery 2.31.0 google-cloud-bigquery-storage 2.10.1 google-cloud-core 2.2.1 google-cloud-storage 1.43.0 google-crc32c 1.3.0 google-resumable-media 2.1.0 googleapis-common-protos 1.54.0 grpcio 1.42.0 grpcio-status 1.42.0 idna 3.3 ipykernel 6.6.0 ipython 7.30.1 ipython-genutils 0.2.0 ipywidgets 7.6.5 jedi 0.18.1 Jinja2 3.0.3 joblib 1.1.0 json5 0.9.6 jsonschema 4.2.1 jupyter-client 7.1.0 jupyter-core 4.9.1 jupyter-server 1.13.1 jupyterlab 3.2.5 jupyterlab-pygments 0.1.2 jupyterlab-server 2.9.0 jupyterlab-widgets 1.0.2 jupytext 1.13.4 kiwisolver 1.3.2 libcst 0.3.23 markdown-it-py 1.1.0 MarkupSafe 2.0.1 matplotlib 3.5.1 matplotlib-inline 0.1.3 mccabe 0.6.1 mdit-py-plugins 0.3.0 mistune 0.8.4 multidict 5.2.0 mypy-extensions 0.4.3 nbclassic 0.3.4 nbclient 0.5.9 nbconvert 6.3.0 nbformat 5.1.3 nest-asyncio 1.5.4 notebook 6.4.6 numpy 1.21.4 oauthlib 3.1.1 packaging 21.3 pandas 1.3.5 pandas-gbq 0.15.0 pandocfilters 1.5.0 papermill 2.3.3 parso 0.8.3 pathspec 0.9.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 8.4.0 pip 22.0.3 platformdirs 2.4.0 prometheus-client 0.12.0 prompt-toolkit 3.0.24 proto-plus 1.19.8 protobuf 3.19.1 ptyprocess 0.7.0 pyarrow 5.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycodestyle 2.8.0 pycparser 2.21 pydata-google-auth 1.3.0 pyflakes 2.4.0 Pygments 2.10.0 pyparsing 3.0.6 pyrsistent 0.18.0 python-dateutil 2.8.2 pytz 2021.3 PyYAML 6.0 pyzmq 22.3.0 requests 2.26.0 requests-oauthlib 1.3.0 rsa 4.8 scikit-learn 1.0.1 scipy 1.7.3 seaborn 0.11.2 Send2Trash 1.8.0 setuptools 60.6.0 setuptools-scm 6.3.2 six 1.16.0 sniffio 1.2.0 tenacity 8.0.1 terminado 0.12.1 testpath 0.5.0 textwrap3 0.9.2 threadpoolctl 3.0.0 toml 0.10.2 tomli 1.2.2 tornado 6.1 tqdm 4.62.3 traitlets 5.1.1 typing_extensions 4.0.1 typing-inspect 0.7.1 urllib3 1.26.7 my-voila-project 0.1.0 voila 0.3.0 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 1.2.3 websockets 10.1 wheel 0.37.1 widgetsnbextension 3.5.2 yarl 1.7.2

Command Line Output
[Voila] Using /tmp to store connection files
[Voila] Storing connection files in /tmp/voila_7ryoha1v.
[Voila] Serving static files from /home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/voila/static.
[Voila] Voilà is running at:
http://user1-laptop:8080/ 
[Voila] Kernel started: 8c01c683-766f-4f1a-a15f-23944a1bb72f
[Voila] Kernel started: 73cda5ff-1c6b-4dd5-a7a5-c8bcefb40fba
[Voila] Kernel started: a6a404c2-13de-4201-a58b-11214fe06f01
[Voila] Kernel started: 348cdf75-df23-4f4b-8828-17dd8b51a2bc
[Voila] Kernel started: e958e8e4-5938-4ca9-b6c7-41373f660eec
[Voila] Kernel started: b4a0bfd3-6325-44b0-ba94-453e67e0a89e
[Voila] Kernel started: 9ef47d30-370f-4401-9eb4-55ba20f81624
[Voila] Kernel started: e0c3d4b2-beff-4ab4-99ef-e6282b0d46f6
[Voila] Kernel pool of abc_dash.ipynb is filled with 2 kernel(s)
[Voila] Kernel pool of def_dash.ipynb is filled with 2 kernel(s)
[Voila] Kernel pool of ghi_dash.ipynb is filled with 2 kernel(s)
[Voila] Kernel pool of jkl_dash.ipynb is filled with 2 kernel(s)
[Voila] Kernel shutdown: a6a404c2-13de-4201-a58b-11214fe06f01
[Voila] Kernel shutdown: 73cda5ff-1c6b-4dd5-a7a5-c8bcefb40fba
[Voila] Kernel shutdown: b4a0bfd3-6325-44b0-ba94-453e67e0a89e
[Voila] Kernel shutdown: 9ef47d30-370f-4401-9eb4-55ba20f81624
[Voila] Kernel shutdown: e0c3d4b2-beff-4ab4-99ef-e6282b0d46f6
[Voila] Kernel shutdown: 8c01c683-766f-4f1a-a15f-23944a1bb72f
[Voila] Kernel shutdown: 348cdf75-df23-4f4b-8828-17dd8b51a2bc
[Voila] Kernel shutdown: e958e8e4-5938-4ca9-b6c7-41373f660eec
Traceback (most recent call last):
  File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/bin/voila", line 8, in 
    sys.exit(main())
  File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/traitlets/config/application.py", line 846, in launch_instance
    app.start()
  File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/voila/app.py", line 548, in start
    self.listen()
  File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/voila/app.py", line 596, in listen
    self.ioloop.start()
  File "/home/user1/.cache/pypoetry/virtualenvs/my-voila-project-W9Aa18Hz-py3.9/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 199, in start
    self.asyncio_loop.run_forever()
  File "/home/user1/.pyenv/versions/3.9.11/lib/python3.9/asyncio/base_events.py", line 601, in run_forever 
    self._run_once()
  File "/home/user1/.pyenv/versions/3.9.11/lib/python3.9/asyncio/base_events.py", line 1890, in _run_once 
    handle = self._ready.popleft()
IndexError: pop from an empty deque

steve-marmalade avatar Jul 28 '22 20:07 steve-marmalade

Hi, I can not reproduce it on my machine, can you track the memory usage when you start Voila? Does it relate to the specs of the machine?

trungleduc avatar Aug 01 '22 12:08 trungleduc

Hi @trungleduc , thank you for the quick response!

I am happy to provide any debugging info that you think would be useful. I just did a rudimentary test by watching htop while I started the voila server, and I did not see any spike in memory usage that could explain the issue. My laptop also has 8 cores, so it doesn't seem like a n_kernels > n_cores issue either. For reference, the CLI crashes within about 5 seconds of starting, so it's not obviously a resource utilization issue.

Can you think of any other info I can provide that would be useful?

steve-marmalade avatar Aug 01 '22 16:08 steve-marmalade

Your error is IndexError: pop from an empty deque, it is likely related to this issue: https://github.com/jupyterlab/jupyterlab/issues/11934 The proper fix is likely somewhere else, but the underlying issue is that nest_asyncio has a race condition if it gets patched in while there are events queued for execution on the asyncio loop.

vidartf avatar Aug 02 '22 16:08 vidartf

Thank you so much for commenting @vidartf !

I was able to apply the patch you proposed here to the voila source code here, and I am no longer seeing this crash. I greatly appreciate the work-around.

Is there a downside to merging this fix into voila ?

steve-marmalade avatar Aug 02 '22 16:08 steve-marmalade

@steve-marmalade I want to try the fix but can not reproduce the issue. Can you provide a minimal notebook to reproduce it?

trungleduc avatar Aug 04 '22 07:08 trungleduc

@trungleduc Since this is a race-condition it can be pretty tricky to reproduce it. I haven't been able to see any direct correlation between notebook content and this behavior, but here are suspected things that might make it easier to reproduce as per my struggles in the lab issue:

  • use tornado 6.2. While the issue still appears on 6.1, it is easier to reproduce and identify issues on 6.2.
  • Modify the code in and around the IndexError: pop from an empty deque in the stdlib's asyncio code. Ideally the code there should never be called if the nest_asyncio patch is applied as early as possible, but this issues occurs when there are tasks queued on an unpatched loop once the patch gets called. In that case there is a race while the original unpatched loop processes its queue, so the more tasks are on it when patch gets called, the more likely it is to trigger.

vidartf avatar Aug 16 '22 11:08 vidartf