jupyterhub-idle-culler
jupyterhub-idle-culler copied to clipboard
JupyterHub service to cull idle servers and users
JupyterHub Idle Culler Service
jupyterhub-idle-culler provides a JupyterHub service to identify and stop idle
or long-running Jupyter servers via JupyterHub. It works solely by interacting
with JupyterHub's REST API, and is often configured to run as a JupyterHub
managed service started up by JupyterHub itself.
Setup
Setup involves three parts:
- Install the Python package.
- Configure JupyterHub permissions to work against JupyterHub's REST API.
- Configure how its started up, either as a JupyterHub managed service or as a standalone script.
Installation
pip install jupyterhub-idle-culler
Permissions
Prior to JupyterHub 2.0, the jupyterhub-idle-culler required full administrative privileges,
in order to have sufficient permissions to stop servers on behalf of users.
JupyterHub 2.0 introduces scopes to allow for more fine-grained permission control. This means that the configured culler service does not need full administrative privileges anymore. It can be assigned only the permissions it needs.
jupyterhub-idle-culler requires the following scopes to function:
list:users- to access to the user list API, our source of information about who to cullread:users:activity- to read the users'last_activityfieldread:servers- to read the users'serversfielddelete:servers- to stop users' servers, and delete named servers if--remove-named-serversis passedadmin:users(optional) - to delete users if--cull-usersis passed
To assign the service the appropriate permissions, declare a role in your jupyterhub_config.py:
c.JupyterHub.load_roles = [
{
"name": "jupyterhub-idle-culler-role",
"scopes": [
"list:users",
"read:users:activity",
"read:servers",
"delete:servers",
# "admin:users", # if using --cull-users
],
# assignment of role's permissions to:
"services": ["jupyterhub-idle-culler-service"],
}
]
As a hub managed service
In jupyterhub_config.py, add the following dictionary for the idle-culler
service to the c.JupyterHub.services list:
c.JupyterHub.services = [
{
"name": "jupyterhub-idle-culler-service",
"command": [
sys.executable,
"-m", "jupyterhub_idle_culler",
"--timeout=3600",
],
# "admin": True,
}
]
where:
"command"indicates that the Service will be managed by the Hub, and"admin": Truegrants admin permissions to this Service and is only meant for use with jupyterhub < 2.0; see [above][permissions].
As a standalone script
jupyterhub-idle-culler can also be run as a standalone script. It can
access the hub's api with a service token.
Register the service token with JupyterHub in jupyterhub_config.py:
c.JupyterHub.services = [
{
"name": "jupyterhub-idle-culler-service",
"api_token": "...",
# "admin": True,
}
]
where:
"api_token"contains a secret token, e.g. generated byopenssl rand -hex 32, and"admin": Truegrants admin permissions to this Service and is only meant for use with jupyterhub < 2.0; see [above][permissions].
and store the same token in a JUPYTERHUB_API_TOKEN environment variable.
Then start jupyterhub-idle-culler manually.
export JUPYTERHUB_API_TOKEN=api_token_above...
python3 -m jupyterhub_idle_culler [--timeout=900] [--url=http://localhost:8081/hub/api]
Command line flags
--api-page-size Number of users to request per page, when
using JupyterHub 2.0's paginated user list
API. Default: user the server-side default
configured page size. (default 0)
--concurrency Limit the number of concurrent requests made
to the Hub. Deleting a lot of users at the
same time can slow down the Hub, so limit
the number of API requests we have
outstanding at any given time. (default 10)
--cull-admin-users Whether admin users should be culled (only
if --cull-users=true). (default True)
--cull-every The interval (in seconds) for checking for
idle servers to cull. (default 0)
--cull-default-servers Whether default servers should be culled (only
if --cull-default-servers=true). (default True)
--cull-named-servers Whether named servers should be culled (only
if --cull-named-servers=true). (default True)
--cull-users Cull users in addition to servers. This is
for use in temporary-user cases such as
tmpnb. (default False)
--internal-certs-location The location of generated internal-ssl
certificates (only needed with --ssl-
enabled=true). (default internal-ssl)
--max-age The maximum age (in seconds) of servers that
should be culled even if they are active.
(default 0)
--remove-named-servers Remove named servers in addition to stopping
them. This is useful for a BinderHub that
uses authentication and named servers.
(default False)
--ssl-enabled Whether the Jupyter API endpoint has TLS
enabled. (default False)
--timeout The idle timeout (in seconds). (default 600)
--url The JupyterHub API URL.
Caveats
-
JupyterHub's
last_activitydata about user servers is not updated with high frequency, so cull timeout should be greater than the sum of:- single-user websocket ping interval (default: 30 seconds)
JupyterHub.last_activity_interval(default: 5 minutes)
-
If you want to use
--cull-userswith a different culling interval for the user servers and users, you must start two idle culler services. This is because both are configured via--timeoutand--max-age. To do so, configure this service to start twice with different configuration, where one has the--cull-usersoption. -
By default
jupyterhub-idle-cullersHTTP requests to JupyterHub's REST API timeouts after 60 seconds. This can be changed by setting theJUPYTERHUB_REQUEST_TIMEOUTenvironment variable.
How it works
JupyterHub's REST API is used to acquire information about activity, and if the idle culler service based on configuration thinks a server should be stopped or deleted it also does so via JupyterHub's REST API.
In depth
jupyterhub-idle-culler relies on permission to work against JupyterHub's REST
API is provided via the JUPYTERHUB_API_TOKEN, that is set automatically for
managed services started by JupyterHub.
jupyterhub-idle-culler lists available users and their server's reported
last_activity via JupyterHub's /users REST API and makes decisions based on
that. User's default servers can be stopped via /users/{name}/server, named
servers can be stopped and optionally removed via
/users/{name}/servers/{server_name}, and users can optionally be deleted via
/users/{name}.
JupyterHub's reported last_activity for user servers is updated by JupyterHub
at a regular interval in the update_last_activity function that relies on
two sources of information.
-
The proxy's routes data
The configurable proxy class for JupyterHub is an interface for JupyterHub to request routing of network traffic to user servers. Through this interface, JupyterHub be informed on network activity if the proxy class provides it, specifically via the
get_all_routesfunction.The configurable-http-proxy used in https://z2jh.jupyter.org provides information about network routes activity, but traefik-proxy used in https://tljh.jupyter.org currently does not.
-
The user server's activity reports
The
update_last_activityfunction also reads JupyterHub's database that keeps state about serverslast_activity. These database records are updated whenever a server notifies JupyterHub about activity, as they are required to do.Servers has before JupyterHub 4 notified JupyterHub about activity by being started by the
jupyterhub-singleuserscript made available by installingjupyterhub(orjupyterhub-singleuseron conda-forge). With JupyterHub 4+ and jupyter_server 2+ a jupyter_server server extension can be used instead.The
jupyterhub-singleuserscript launches a modified server application that keeps JupyterHub updated with the server activity via thenotify_activityfunction.The
notify_activityfunction in turn make use of the server applicationslast_activityfunction (see implementation in NotebookApp and ServerApp respectively) that that combines information from API activity, kernel activity, kernel shutdown, and terminal activity. This activity also covers activity of applications like RStudio running viajupyter-server-proxy.
Here is a summary of what's described so far:
jupyterhub-idle-cullercollects information and acts entirely through JupyterHub's REST API.jupyterhub-idle-cullermakes decisions based on information provided by JupyterHub, that collects activity reports from the user servers and polls the proxy class for information about user servers' network activity.
Now, as the server's kernel activity influence the activity that servers will
notify JupyterHub about, the kernel activity in turn influences
jupyterhub-idle-culler. Due to this, it can be relevant to also learn a little
about a mechanism to cull idle kernels as well even though
jupyterhub-idle-culler isn't involved in that.
The default kernel manager, the MappingKernelManager, can be configured to
cull idle kernels. Its configuration is documented in
ServerApp's
and
NotebookApp's
respective documentation, and here are some relevant kernel culling
configuration options:
-
MappingKernelManager.cull_busy -
MappingKernelManager.cull_idle_timeout -
MappingKernelManager.cull_interval -
MappingKernelManager.cull_connectedNote that
cull_connectedcan be tricky to understand for JupyterLab as a browser having a web-socket connection to a kernel or not isn't as obvious as it was in the classical Jupyter notebook UI. See this issue for more details.Also note that configuration of MappingKernelManager should be made on the user server itself, for example via a
jupyter_server_config.pyfile in/etc/jupyteror/usr/local/etc/jupyterrather than where JupyterHub is running.
Finally, note that a Jupyter server can shut itself down without intervention by
jupyterhub-idle-culler if ServerApp.shutdown_no_activity_timeout is
configured.
Caveats
Pagination
JupyterHub 2.0 introduces pagination to the /users API endpoint. This
pagination does not guarantee a consistent snapshot for consecutive requests
spread over time, so it is possible for a highly active hub to occasionally miss
culling users crossing page boundaries between requests. This is expected to be
an infrequent occurrence and only result in delaying a server being culled by
one cull interval in realistic scenarios, so of minor consequence in JupyterHub.
The issue can be mitigated by requesting a larger page size, via e.g.
--api-page-size=200, but feel free to open an issue if this is causing a
problem for you.