the-littlest-jupyterhub icon indicating copy to clipboard operation
the-littlest-jupyterhub copied to clipboard

Performance, large machines and few users: connection issues

Open michelbl opened this issue 4 years ago • 6 comments

I planned to use TLJH for a classroom of about 20 concurrent users. Following http://tljh.jupyter.org/en/latest/howto/admin/resource-estimation.html I estimated 4Gb of RAM would be enough with a good margin so I installed it on a shared OVH server of type s1-4 (4 Go, 1 vCore, 100 Mbit/s).

Whatever the number of concurrent users, launching a new server sometimes fails (maybe once out of 10 times). Retrying usually works. top does not report a high CPU usage and the available swap is not used.

When the 20 users begin to use the jupyterhub, some of them (maybe 5-7 of them), after creating a notebook, have connection issues with the server. As a result, the execution of cells hangs. Restarting the server and logging out and in does not solve the issue.

Because I anticipated there could be performance issues with that configuration, I was ready to deploy a new TLJH on a slightly more powerfull OVH server (b2-7: 7 Go, 2 guaranteed vCore, 250 Mbit/s). But it did not solve the issues previously described. Eventually I had to make them develop locally.

I don't know if this issue is a bug report of a feature request, but several things could help users in such situations:

  • Even with https://tljh.jupyter.org/en/latest/troubleshooting/index.html I am not able to find any hint about the bottleneck (is it a RAM issue, a CPU issue?). sudo journalctl -u juypyterhub does not report anything
  • Without tool to simulate the load of the whole class prior, so I had to test in live. A way to stress test a hub would be great

michelbl avatar Jan 21 '20 17:01 michelbl

If I understand that correctly, you have a total of 4GB of RAM on the whole machine? That's probably too short for 20 concurrent users. (It's 200MB each. That's likely below the memory footprint of the jupyter notebook server each user gets to themselves.)

willirath avatar Jan 21 '20 18:01 willirath

http://tljh.jupyter.org/en/latest/howto/admin/resource-estimation.html suggests the memory for each user is around 150-180MB (127MB reported by nbresuse with a margin of 20-40%). Why do you take the figure of 200MB?

I added a swap file, restarted all the servers of my users. The swap was not used, yet the same issues appeared.

Going to a server with 7GB of RAM did not change anything.

Do you know of a way to know for sure if the issue is caused by not enough RAM?

michelbl avatar Jan 22 '20 09:01 michelbl

The figure of 200MB comes from your total memory (4000 MB) divided by your number of users (20). This leaves very little memory for the JupyterHub process and for the operating system.

If you've got dstat installed this you can get some clearer on CPU and memory usage than top shows, e.g. run dstat --vmstat 5 (this continually updates, but also averages over 5 readings which is better than an the instantaneous readings given by top)

manics avatar Jan 22 '20 09:01 manics

I'm having related problems, also with about 20 concurrent users, but on a much stronger machine (16 GB RAM, 4 cores). People keep getting "Your server isn't running" warnings and sometimes they have trouble even reaching the proxy server. It's definitely not a RAM issue, half of my RAM is free and swap is unused. I was thinking maybe it's a problem with the number of open files or concurrent network connections, but both my per-user file limit and the system file limit seem large enough. Unfortunately I don't know how to debug a "too many concurrent network connections" problem, and I'm also not sure how heavy Jupyterhub is on network connections. I'd love to debug this if I knew how...

jesuisse avatar Dec 09 '20 18:12 jesuisse

Wondering if I have the same issue - but my server has 64 cores and 256GB of memory, with only ~10 users. I've installed dstat and see if I can trace this.

aolney avatar Jun 10 '21 21:06 aolney

My connection to the server kept dropping out when i was the only person using the hub. digital blue, 4GB RAM. I was doing arithmetic in the cells.

Feels like the setting for timing out is super short. (I'm not a technical person, so if that's a ridiculous thing to say...apologies).

sawula avatar Dec 03 '21 17:12 sawula