hail icon indicating copy to clipboard operation
hail copied to clipboard

[k8s] Move to headless services and make pods listen on 443

Open daniel-goldstein opened this issue 1 year ago • 2 comments

Moves multi-pod deployments over to using Headless Services, which enables client-side load-balancing to the underlying pods. See #12095 for more context.

The reason I put this in its own PR is that Kubernetes won't let me apply the clusterIP: None changes to existing Services, and I must delete the Service resources first. I can manually delete and apply new headless services in a way that is compatible with what is currently on main and with just a few seconds of downtime, but I should do this manually just before this PR merges.

daniel-goldstein avatar Aug 17 '22 20:08 daniel-goldstein

Can you add the links you found where they suggested using the headless services approach for upstream connections?

jigold avatar Sep 22 '22 15:09 jigold

Here's the link to the headless service documentation and here's an example blog post where someone encountered the same issues we were facing with normal services. I think the documentation is motivation enough though: Envoy's STRICT_DNS setting would be considered a form of "service discovery" done through DNS. In order for Envoy to correctly make load balancing decisions, that DNS request should return all the viable IPs for an upstream instead of a single IP that points to kube proxy. Headless services do just that.

daniel-goldstein avatar Sep 22 '22 15:09 daniel-goldstein

The * means that route will be triggered for any request matching the specified URL for any method, be it GET or POST, etc. The reason I needed to make that change is that when envoy makes an authentication request to that endpoint, it uses the HTTP method of the original request. E.g. If I make a POST to https://internal.hail.is/dgoldste/batch/batches/create envoy will authenticate me with a POST request to auth:443/api/v1alpha/verify_dev_credentials. So I can't set that endpoint to be any one method.

daniel-goldstein avatar Oct 18 '22 21:10 daniel-goldstein

I think port 443 is so we don't need root privileges in Envoy?

This is related to the way headless services expose the pod itself, but as I'm writing this I feel like I want more clarity on exactly why, so I will do a bit of digging and come back with a better response.

daniel-goldstein avatar Oct 18 '22 21:10 daniel-goldstein

Ah, I remember why this is. Here's a diagram of the current and proposed scenarios that I hope helps:

Normal services (current main)

  1. gateway receives a request destined for batch.hail.is
  2. gateway intends to forward this request to batch.default:443
  3. gateway makes a DNS request to resolve batch.default. gateway receives IP A.A.A.A which is the cluster IP of the batch Kubernetes Service
  4. gateway forwards the request to A.A.A.A:443
  5. The Kubernetes Service (really kube-proxy) receives the request, selects a pod with IP X.X.X.X and forwards the request to X.X.X.X:5000

Proposed headless service approach

  1. gateway receives a request destined for batch.hail.is
  2. gateway intends to forward this request to batch.default:443
  3. gateway makes a DNS request to resolve batch.default. gateway receives multiple DNS records back saying that batch.default corresponds to the IP addresses X.X.X.X, Y.Y.Y.Y, and Z.Z.Z.Z (assuming there are 3 pods in the deployment).
  4. gateway gets its pick out of the pods (this is really important and is why envoy needs all the IPs to properly load balance!) and decides to forward the request directly to pod X.X.X.X:443

So in the second scenario, it is necessary that the pod itself be listening on 443 because that is where gateway is going to send the request. It is not exactly a permissions issue, but upon writing this I am now realizing that by doing so we require that the service pods like auth and batch be running as root in order to bind on port 443. I think the port specified in the Service yaml is actually useless now. So two actionable options are:

  1. Remove the useless port field on the Service yaml for auth, batch, etc.
  2. Keep all of our services on unprivileged ports (5000) and have gateway forward traffic to batch.default:5000 instead of batch.default:443. Keeping our services on port 5000 could allow us to run those services as non-root users. I guess k8s has them running as root by default…

daniel-goldstein avatar Oct 18 '22 21:10 daniel-goldstein

So after thinking about it a bit, I think 2 (listening on unprivileged ports instead of 443 for our in-cluster communication) would be a good thing to do in general, but I do think that it complicates this PR a bit. There's a few more places where we assume services are listening on 443 e.g. deploy_config.py, grafana / prometheus, etc. I think it would be best to make this change and then consider separately the task of moving from 443 -> 5000 for in-cluster communication.

daniel-goldstein avatar Oct 19 '22 12:10 daniel-goldstein