the-littlest-jupyterhub icon indicating copy to clipboard operation
the-littlest-jupyterhub copied to clipboard

JupyterLab Server of user crashes

Open Agrigor opened this issue 2 years ago • 5 comments

Bug description

At some point and several times per day the server of some users crashes, with the following error in journalctl of jupyter-username:

Mai 04 08:39:25 jupyterhubvm systemd[1]: Started /bin/bash -c cd /home/jupyter-cyril && exec jupyterhub-singleuser --port=38773 --SingleUserNotebookApp.default_url=/lab.
Mai 04 08:39:26 jupyterhubvm bash[14907]: [I 2022-05-04 08:39:26.512 SingleUserNotebookApp notebookapp:1593] Authentication of /metrics is OFF, since other authentication is disabled.
Mai 04 08:39:27 jupyterhubvm bash[14907]: [I 2022-05-04 08:39:27.378 LabApp] JupyterLab extension loaded from /opt/tljh/user/lib/python3.9/site-packages/jupyterlab
Mai 04 08:39:27 jupyterhubvm bash[14907]: [I 2022-05-04 08:39:27.378 LabApp] JupyterLab application directory is /opt/tljh/user/share/jupyter/lab
Mai 04 08:39:27 jupyterhubvm bash[14907]: /opt/tljh/user/lib/python3.9/site-packages/jupyter_server_mathjax/app.py:40: FutureWarning: The alias `_()` will be deprecated. Use `_i18n()` instead.
Mai 04 08:39:27 jupyterhubvm bash[14907]:   help=_("""The MathJax.js configuration file that is to be used."""),
Mai 04 08:39:27 jupyterhubvm bash[14907]: [W 2022-05-04 08:39:27.510 SingleUserNotebookApp notebookapp:2034] Error loading server extension nbresuse
Mai 04 08:39:27 jupyterhubvm bash[14907]:     Traceback (most recent call last):
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/notebook/notebookapp.py", line 2030, in init_server_extensions
Mai 04 08:39:27 jupyterhubvm bash[14907]:         func(self)
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/nbresuse/__init__.py", line 49, in load_jupyter_server_extension
Mai 04 08:39:27 jupyterhubvm bash[14907]:         PrometheusHandler(PSUtilMetricsLoader(nbapp)), 1000
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/nbresuse/prometheus.py", line 25, in __init__
Mai 04 08:39:27 jupyterhubvm bash[14907]:         gauge = Gauge(phrase, "counter for " + phrase.replace("_", " "), [])
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/prometheus_client/metrics.py", line 355, in __init__
Mai 04 08:39:27 jupyterhubvm bash[14907]:         super(Gauge, self).__init__(
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/prometheus_client/metrics.py", line 136, in __init__
Mai 04 08:39:27 jupyterhubvm bash[14907]:         registry.register(self)
Mai 04 08:39:27 jupyterhubvm bash[14907]:       File "/opt/tljh/user/lib/python3.9/site-packages/prometheus_client/registry.py", line 29, in register
Mai 04 08:39:27 jupyterhubvm bash[14907]:         raise ValueError(
Mai 04 08:39:27 jupyterhubvm bash[14907]:     ValueError: Duplicated timeseries in CollectorRegistry: {'total_memory_usage'}

Expected behaviour

No crash

Actual behaviour

Crash and restart of server & kernel required Important: It is independent of ram usage, even after fresh reboot.

How to reproduce

Hard to reproduce, just waiting

Your personal set up

  • OS: ubuntu 18.04
  • Version(s): latest TLJH version, Proxmox VM on Intel Xeon E5-2620 v4, 256GB RAM
Full environment
asn1crypto==0.24.0
attrs==17.4.0
Automat==0.6.0
bcrypt==3.2.0
blinker==1.4
cached-property==1.5.2
certifi==2018.1.18
cffi==1.15.0
chardet==3.0.4
charset-normalizer==2.0.9
click==6.7
cloud-init==22.1
colorama==0.3.7
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
cryptography==36.0.0
distro==1.6.0
distro-info===0.18ubuntu0.18.04.1
docker==5.0.3
docker-compose==1.29.2
dockerpty==0.4.1
docopt==0.6.2
httplib2==0.9.2
hyperlink==17.3.1
idna==2.6
incremental==16.10.1
iotop==0.6
Jinja2==2.10
jsonpatch==1.16
jsonpointer==1.10
jsonschema==2.6.0
keyring==10.6.0
keyrings.alt==3.0
language-selector==0.1
MarkupSafe==1.0
netifaces==0.10.4
numpy==1.19.5
oauthlib==2.0.6
PAM==0.4.2
paramiko==2.8.1
pexpect==4.2.1
pyasn1==0.4.2
pyasn1-modules==0.2.1
pycparser==2.21
pycrypto==2.6.1
PyGObject==3.26.1
PyJWT==1.5.3
PyNaCl==1.4.0
pyOpenSSL==17.5.0
pyserial==3.4
python-apt==1.6.5+ubuntu0.7
python-debian==0.1.32
python-dotenv==0.19.2
pyxdg==0.25
PyYAML==3.12
requests==2.26.0
requests-unixsocket==0.1.5
SecretStorage==2.3.1
semantic-version==2.8.5
service-identity==16.0.0
six==1.11.0
sos==4.3
ssh-import-id==5.7
systemd-python==234
texttable==1.6.4
Twisted==17.9.0
typing_extensions==4.0.1
ubuntu-advantage-tools==27.7
ufw==0.36
unattended-upgrades==0.1
urllib3==1.22
websocket-client==0.59.0
zope.interface==4.3.2
Configuration
users:
  admin:
  - agrigor
user_environment:
  default_app: jupyterlab
services:
  configurator:
    enabled: false
auth:
  type: nativeauthenticator.NativeAuthenticator
  NativeAuthenticator:
    open_signup: true
Logs

Agrigor avatar May 04 '22 09:05 Agrigor

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

welcome[bot] avatar May 04 '22 09:05 welcome[bot]

ping :)

Agrigor avatar Nov 09 '22 14:11 Agrigor

@Agrigor sorry, it's been a busy couple of months! It looks like you have two packages publishing the same metrics: the older nbresuse, and the newer jupyter-resource-usage. I think if you remove nbresuse, you should get what you want.

minrk avatar Nov 24 '22 08:11 minrk

He @minrk, thanks for your answer! I just uninstalled nbresuse, but unfortunately the crashes are still existing all the time ... Do you have any other idea how I can debug or even fix this crash issue? KR

Agrigor avatar Dec 02 '22 09:12 Agrigor

@Agrigor sorry, it's been a busy couple of months! It looks like you have two packages publishing the same metrics: the older nbresuse, and the newer jupyter-resource-usage. I think if you remove nbresuse, you should get what you want.

thanks, when i start jupyterlab after uninstall nbresuse,it solved my problem like:ValueError: Duplicated timeseries in CollectorRegistry: {'total_memory_usage'}

sunway910 avatar Dec 09 '22 08:12 sunway910