dask-gateway icon indicating copy to clipboard operation
dask-gateway copied to clipboard

question: granular cluster resource limits based on user / Jupyter RBAC

Open lukasheinrich opened this issue 4 years ago • 3 comments

Hi,

dask-gateway allows setting resource limits on the clusters that can be created by users .. https://gateway.dask.org/resource-limits.html

but they are global for all users. for a multi-tenant deployment it is often the case that there are various user groups to which different limits might apply.

Jupyter Lab 2.0 introduces some notion of RBAC-based authentication

and I was wondering whether that could be used fo set more granular settings in dask-gateway?

perhaps @consideRatio has thoughts here as well

lukasheinrich avatar Oct 15 '21 09:10 lukasheinrich

Thanks for opening this issue.

The resource-limit settings in dask gateway are actually per-cluster (e.g. max cores per cluster). Setting them as described in that doc set global defaults, but those can be overridden by per-user configuration in a options_handler. This takes any options specified by the user when creating the cluster, and the user object itself (see here), and returns a new set of options to apply to that cluster. This flexibility allows for defining whatever per-user/group rules you want, without having to bake those in to dask-gateway itself. For example:

from dask_gateway_server.options import Options

def options_handler(options, user):
    # Users in the `power-users` group get bigger clusters with higher limits
    if "power-users" in user.groups:
        return {
            "worker_cores": 8,
            "worker_memory": "16 G",
            "cluster_max_workers": 100,
        }
    else:
        return {
            "worker_cores": 4,
            "worker_memory": "8 G",
            "cluster_max_workers": 10,
        }

c.Backend.cluster_options = Options(handler=options_handler)

Note that when authenticating with JupyterHub, the .groups field mirrors that of JupyterHub.

jcrist avatar Oct 15 '21 13:10 jcrist

Excellent this is exactly what we need - is there an option to set a max cluster lifetime after which the cluster will be culled?

lukasheinrich avatar Oct 15 '21 14:10 lukasheinrich

There's idle_timeout (https://gateway.dask.org/api-server.html#c.ClusterConfig.idle_timeout), a max time for the cluster to sit idle (unused) before it's culled, but there isn't a total max runtime for the cluster itself. That wouldn't be too tricky to add though if it'd be useful for you. File an issue if so.

jcrist avatar Oct 15 '21 14:10 jcrist