dask-gateway icon indicating copy to clipboard operation
dask-gateway copied to clipboard

Templatize the kubernetes resources that dask-gateway generates

Open costrouc opened this issue 4 years ago • 4 comments
trafficstars

For QHub we have moved away from using the dask-gateway helm chart to more tightly integrate dask-gateway with our traefik http/https/tcp proxy https://github.com/Quansight/qhub-terraform-modules/tree/main/modules/kubernetes/services/dask-gateway. This was mainly motivated when we found of that two traefik services in the same namespace in kubernetes do not play well with each other.

All that said the current issues we are facing is around decorating the IngressRoute and needing to add tls: {"certManager": "default"}. I would like to propose templatizing the resource objects being created via Traitlets.

For example

INGRESS_ROUTE_TEMPLATE = {
            "apiVersion": "traefik.containo.us/v1alpha1",
            "kind": "IngressRoute",
            "metadata": {
                "labels": "PLACEHOLDER",
                "annotations": "PLACEHOLDER",
                "name": "PLACEHOLDER",
            },
            "spec": {
                "entryPoints": "PLACEHOLDER",
                "routes": [
                    {
                        "kind": "Rule",
                        "match": "PLACEHOLDER",
                        "services": [
                            {
                                "name": "PLACEHOLDER",
                                "namespace": "PLACEHOLDER",
                                "port": 8787,
                            }
                        ],
                        "middlewares": "PLACEHOLDER",
                    }
                ],
            },
        }

Or possibly we should just make the make_ingressroute functions and similar overridable via traetlets callables. We need this functionality to expose the dask scheduler dashboard with https.

cc: @aktech

costrouc avatar Apr 15 '21 19:04 costrouc

Thanks! I think it'd be great to remove the need for this workaround in QHub.

@droctothorpe or @consideRatio does this proposal sound sensible to you?

TomAugspurger avatar Apr 17 '21 20:04 TomAugspurger

My understanding summarized

  • dask-gateway the Helm chart creates traefik.containo.us/v1alpha1/IngressRoute resources
  • These custom resources can be managed by Traefik, and dask-gateway deploys Traefik
  • Some IngressRoute resources are created directly by the dask-gateway Helm chart templates, and some are dynamically created by the DaskCluster controller for each DaskCluster resource
  • Those IngressRoute resources that are dynamically created, are created by the function make_ingressroute.

I'll number some of my thoughts as I consider this further.

Questions in my mind

Better understanding of the problem

  1. Is the issue you experience @costrouc caused by two Traefik controllers working against the same IngressRoute resource?
  2. Are you having issues both with the IngressRoute's created by the dask-gateway Helm chart and the dynamically created IngressRoute resources, or only by one of these?
  3. What kind of changes would you make to the template if you had it, in order to avoid the issue you experience? Answer: having tls.certManager=default for example.

Solution exploration

  1. I've seen the pattern of adding an annotation to k8s native Ingress resources to declare what controller should respond to them before. I don't think this is sufficient though as you also want to change for example tls.certMansger.
  2. I've seen use of k8s mutating webhooks that modifies resources before they are accepted to the k8s api-server, but I think it's overkill to suggest someone does that and I think its in scope to be to make some customizations.
  3. A merge strategy can be reasonable, but for example extraPodConfig. A downside is the complexity of making a change to an item in a list though, which is why KubeSpawner for example have extra_pod_config and extra_container_config separate from each other.
  4. A configurable template can be reasonable as well.
  5. Overriding the functions to generate the resources doesn't feel so robust to me at this point.

What do I think at the moment?

Hmmm... I think using a configurable template would be reasonable (7). Not very confident this is the right way to go, but it feels the most reasonable to explore in my mind.

When it comes to customizing the Helm charts declared k8s resource templates, I'd like to see an overview of:

  • what is know that we may want to configure
  • what is already configurable and in what way

With such insight, it would be reasonable to make a decision on how and if to support further configuration.

consideRatio avatar Apr 17 '21 20:04 consideRatio

@consideRatio's input covers most of the bases.

This was mainly motivated when we found of that two traefik services in the same namespace in kubernetes do not play well with each other.

Can you elaborate on the errors that you saw?

We need this functionality to expose the dask scheduler dashboard with https.

FWIW, we addressed this problem by terminating HTTPS at the ELB, which was as simple as adding the appropriate annotations to the Traefik service and ingress in the values yaml and letting cloud provider and external DNS work their magic.

droctothorpe avatar Apr 20 '21 05:04 droctothorpe

Can you elaborate on the errors that you saw?

For reference, this was the tracking issue for the errors we saw.

https://github.com/Quansight/qhub/issues/358

dharhas avatar Apr 29 '21 21:04 dharhas