dask-gateway
dask-gateway copied to clipboard
Document generic install, configuration points
There is mention of the dask_gateway_config.py
file in various parts of the documentation, but I don't think there is any actual mention of how to create this file, or where it should live. Grepping in the codebase I found that this file could be generated from the CLI and was able to make progress, but it might be useful to others to add a note in the Configuration doc page.
Because the admin practices for each backend are different (e.g. on kubernetes admins would never write that file, they'd use helm), there hasn't been a need for documenting the dask_gateway_config.py
file directly - we've been relying on the admin walkthrough guides. I could see the use for document more internal specifics for a general install though. Is something like https://gateway.dask.org/install-hadoop.html#configure-dask-gateway-server (and the following steps) sufficient?
Also note - most of the admin-side documentation is out of date since the rewrite. The process for everything non-kubernetes should remain the same, but the actual parameter field names to configure have changed.
We also document every configurable field here (although these are also out of date since the rewrite): https://gateway.dask.org/api-server.html
Is something like https://gateway.dask.org/install-hadoop.html#configure-dask-gateway-server (and the following steps) sufficient?
Yes, I think so. In the first few lines there it says where that file should live (/etc/...
) so it's clear to me as a novice user that I should make an empty file there and copy-paste things in if I want to change them.
I ran into this when I was looking at https://gateway.dask.org/authentication.html#simple-authentication-for-testing and didn't know where the file was supposed to be that it was referring to. I then searched for and found the Configuration page.
(Also, unrelatedly, it's not clear how to use dummy auth from the user's side, although this may also be low priority given how it's unlikely to be useful in production. Happy to raise a separate issue if you think this is worth reporting.).
Your case is a bit odd because you're looking at extending dask-gateway, and we have no docs about that. Our docs are mostly focused at users and admin, with walkthroughs for common user profiles. We should add a new page meeting your needs.
Also, unrelatedly, it's not clear how to use dummy auth from the user's side
In master it's now called SimpleAuthenticator
(dummy had bad connotations). From the user's side they need to pass in a BasicAuth
object to dask_gateway.Gateway
or dask_gateway.GatewayCluster
. Can be set once in the gateway.yaml
(https://gateway.dask.org/configuration-user.html#default-configuration) or programmatically by passing a BasicAuth
object through the auth
kwarg.
import dask_gateway
gateway = dask_gateway.Gateway(auth=dask_gateway.BasicAuth())
Your case is a bit odd because you're looking at extending dask-gateway
I haven't gotten there yet. I'm still just poking around.
gateway = dask_gateway.Gateway(auth=dask_gateway.BasicAuth())
Yeah, I got there eventually. Just reporting up doc issues. Feel free to disregard (or perhaps now that we've recorded the solution that will suffice for others).
I'm going to keep this open to remind me to better document non-standard installs and configuration points.
I also have some questions related to the initial comment on this thread, around dask_gateway_config.py. I'm a novice user, so my questions are probably pretty naive
I'm coming from having used the dask helm templates, but then having many different types of workers exploded my configs. In addition, I had a lot of trouble auto-scaling workers up/down using Kubernetes. Trying out dask-gateway was super simple -- simplified my configs, and my initial local tests with the auto-scaling were great.
My first question is around images. There is the gateway.backend.image in the helm repos, which seems to indicate the same image is used for worker and scheduler. Why does the scheduler use the same image as the worker? Is this the same image overridden by c.KubeClusterConfig.image?
What is the relationship between dask-worker
and dask-gateway-worker
? I've rolled my own images with dask-worker
, but haven't come across dask-gateway-worker
.
For thread-hostile work, I often have more cores than threads I expose to dask. How can I pass in '--nthreads' separate from worker_cores
? Similarly, if I'd like to pass --no-nanny
and --resources
. Should this by done by updating worker_cmd
in cluster options?
Is it possible to create a heterogenous cluster (workers of different types: bigmem, gpu, singlethreaded)?
I'd also like to ask about this. I'm trying to expose cluster options for user configuration, and to my understanding, that should be done in the dask_gateway_config.py file. We are prototyping locally to offload out kubernetes devops until we seem to have a working solution to hand over to them. From what I've got from documentation, I've tried putting the dask_gateway_config.py both in ~/.config/dask, and in /etc/dask-gateway but I have not succeded in getting the local gateway server to pick them up. A little documentation here would be helpful.
@jontis -- dask-gateway-server
looks in .
for a dask_gateway_config.py
file by default. You can pass it a config file using -f
, e.g.
dask-gateway-server -f my_config.py
@gforsyth How does one decide between -f my_config.py
and using extraConfig
in the yaml file?
@chrisroat -- I would only use -f my_config.py
directly if I'm making changes to dask-gateway-server
and need to test+run locally. For deployments, extraConfig
is much easier to edit (and in the end, the context of the extraConfig
block are appended to the dask_gateway_config.py
file)