Culler not working when there are too many users?
Bug description
Culler is not culling despite the last activity clearly being surpassed by the maximum time allowed.
Current setup from our k8s logs:
Starting service 'cull-idle': ['python3', '-m', 'jupyterhub_idle_culler', '--url=https://<url>:5081/hub/api', '--timeout=14400', '--cull-every=600', '--concurrency=10']
We observe two things:
- user is using the default server
"last_activity": "2025-01-23T16:01:47.564284Z",
"started": "2025-01-13T09:57:08.123732Z",
- user is not using the analytics named server
"last_activity": "2025-01-20T12:06:54.100549Z",
"started": "2025-01-13T13:51:04.802281Z",
In the second case the last_activity was over 3days ago which is way more than the timeout of 14400 (4 hours)
Example obtained from /api/users/<user>
{
"server": "/user/<user>/",
"created": "2023-02-06T09:37:18.564165Z",
"admin": false,
"name": "<user>",
"kind": "user",
"last_activity": "2025-01-23T16:06:51.635146Z",
"roles": [
"user"
],
"auth_state": {
"email": "<email>"
},
"pending": null,
"groups": [
"<groups>"
],
"servers": {
"": {
"name": "",
"full_name": "<user>/",
"last_activity": "2025-01-23T16:01:47.564284Z",
"started": "2025-01-13T09:57:08.123732Z",
"pending": null,
"ready": true,
"stopped": false,
"url": "/user/<user>/",
"user_options": {
"profile": "scipy-spark",
"server_size": "model_training",
"service_account": "data-science"
},
"progress_url": "/hub/api/users/<user>/server/progress",
"full_url": null,
"full_progress_url": null,
"state": {
"pod_name": "jupyter-<first_name>-2d<last_name>",
"namespace": "jupyter-hub",
"dns_name": "jupyter-<first_name>-2d<last_name>.jupyter-hub.svc.cluster.local"
}
},
"analytics": {
"name": "analytics",
"full_name": "<user>/analytics",
"last_activity": "2025-01-20T12:06:54.100549Z",
"started": "2025-01-13T13:51:04.802281Z",
"pending": null,
"ready": true,
"stopped": false,
"url": "/user/<user>/analytics/",
"user_options": {
"profile": "python-3-analytics",
"server_size": "default",
"service_account": "data-science"
},
"progress_url": "/hub/api/users/<user>/servers/analytics/progress",
"full_url": null,
"full_progress_url": null,
"state": {
"pod_name": "jupyter-<first_name>-2d<last_name>--analytics",
"namespace": "jupyter-hub",
"dns_name": "jupyter-<first_name>-2d<last_name>--analytics.jupyter-hub.svc.cluster.local"
}
}
}
}
How to reproduce
I don't know how to reproduce this, we have jupyterhub deployed in several datacenters but only fails in our busiest one with over 150 servers simultaneously at the same time (not sure if it's an issue).
I'm afraid that the service is blocked / not running as expected because there are too many users
Expected behaviour
To cull the servers after they timeout
Actual behaviour
Is not culling the servers after the timeout
Your personal set up
Latest version of jupyterhub and idle culler
Logs
I can see the following logs from the culler: ⚠ Not sure if important but the fetching page 2 doesn't show the port just the url
Jan 23 16:41:12.885
nbt-hub
[I 250123 15:41:11 __init__:156] Fetching page 2 https://<url without port>/hub/api/users?state=ready&offset=50&limit=50
Jan 23 16:31:12.600
nbt-hub
File "/usr/local/lib/python3.10/dist-packages/jupyterhub_idle_culler/__init__.py", line 124, in fetch
Jan 23 16:31:12.600
nbt-hub
File "/usr/local/lib/python3.10/dist-packages/jupyterhub_idle_culler/__init__.py", line 142, in fetch_paginated
Jan 23 16:31:12.599
nbt-hub
File "/usr/local/lib/python3.10/dist-packages/jupyterhub_idle_culler/__init__.py", line 436, in cull_idle
Sometimes I also get
/usr/lib/python3.10/collections/__init__.py:431: RuntimeWarning: coroutine 'cull_idle.<locals>.handle_user' was never awaited
After looking into the issue I found the problem.
During paginationnext_info["url"] doesn't contain the port in the url
req.url = next_info["url"]
https://github.com/jupyterhub/jupyterhub-idle-culler/blob/main/jupyterhub_idle_culler/init.py#L158
which throws an error during fetching
[I 250123 20:02:44 __init__:156] Fetching page 2 https://<my_domain>/hub/api/users?state=ready&offset=50&limit=50
[E 250123 20:02:44 ioloop:770] Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0xf73e1c04f580>>, <Task finished name='Task-1' coro=<cull_idle() done, defined at /usr/local/lib/python3.10/dist-packages/jupyterhub_idle_culler/__init__.py:78> exception=ConnectionRefusedError(111, 'Connection refused')>)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py", line 750, in _run_callback
ret = callback()
File "/usr/local/lib/python3.10/dist-packages/tornado/ioloop.py", line 774, in _discard_future_result
future.result()
File "/usr/local/lib/python3.10/dist-packages/jupyterhub_idle_culler/__init__.py", line 436, in cull_idle
async for user in fetch_paginated(req):
File "/usr/local/lib/python3.10/dist-packages/jupyterhub_idle_culler/__init__.py", line 142, in fetch_paginated
response = await resp_future
File "/usr/local/lib/python3.10/dist-packages/jupyterhub_idle_culler/__init__.py", line 124, in fetch
return await client.fetch(req)
ConnectionRefusedError: [Errno 111] Connection refused
Dummy fix for my use case would be:
req.url = next_info["url"].replace(
"<my_url>",
"<my_url>:5081"
)
Not ideal, but I was wondering, Is this something jupyterhub should return properly? or the problem is the culler?
I believe it could be jupyterhub since if we have hub on
localhost:8000
first call would look like
localhost:8000/hub/api/users?state=ready&offset=0&limit=50
and the pagination would return as next
localhost/hub/api/users?state=ready&offset=0&limit=50 without the port which is incorrect
Please can you tell us your Z2JH version, and share your full configuration?
We are using Z2JH version 2.0.0. As for our configuration:
For our culler config:
# cull relates to the jupyterhub-idle-culler service, responsible for evicting
# inactive singleuser pods.
#
# The configuration below, except for enabled, corresponds to command-line flags
# for jupyterhub-idle-culler as documented here:
# https://github.com/jupyterhub/jupyterhub-idle-culler#as-a-standalone-script
#
cull:
enabled: true
users: false # --cull-users
removeNamedServers: false # --remove-named-servers
timeout: 14400 # --timeout (in seconds) -> server will shutdown if inactive after 4 hours
every: 600 # --cull-every (in seconds) -> checks for expired servers every 10 minutes
concurrency: 10 # --concurrency
maxAge: 0 # --max-age
Installed versions that I believe matter:
"jupyterhub==5.2.0",
"jupyterhub-traefik-proxy==2.0.0",
"jupyterhub-idle-culler==1.4.0",
"jupyterhub-announcement",
"kubernetes==28.1.0",
Can confirm we are seeing this too. Activity is not updating and idle-culler appears to be not culling.
[D 2025-07-04 21:40:08.766 JupyterHub scopes:1010] Checking access to /hub/api/users/johndoe/activity via scope users:activity!user=johndoe
[D 2025-07-04 21:40:08.771 JupyterHub users:964] Not updating activity for <User(johndoe1/1 running)>: 2025-07-04T21:30:06.405342Z < 2025-07-04T21:30:06.405342Z
[D 2025-07-04 21:40:08.772 JupyterHub users:985] Not updating server activity on johndoe/: 2025-07-04T21:30:06.405342Z < 2025-07-04T21:30:06.405342Z