istio icon indicating copy to clipboard operation
istio copied to clipboard

SE may break api-server connectivity in ambient

Open howardjohn opened this issue 1 year ago • 4 comments

In discussions with @danielloader, it sounds like there may be issues with ambient api-server connectivity in some scenarios.

My repro attempt on kind:


apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: k8s-api-ext
spec:
  hosts: [kubernetes.default.svc.cluster.local]
  addresses: [10.96.0.1]
  endpoints:
  - address: 10.96.0.1
  location: MESH_EXTERNAL
  resolution: STATIC
  ports:
  - number: 443
    name: https-k8s
    protocol: HTTPS

Internal representation in ztunnel:

  "/10.96.0.1": {
    "endpoints": {
      "Kubernetes/discovery.k8s.io/EndpointSlice/default/kubernetes/172.18.0.5:/172.18.0.5": {
        "address": "/172.18.0.5",
        "port": {
          "443": 6443
        },
        "service": "default/kubernetes.default.svc.cluster.local",
        "workloadUid": "Kubernetes/discovery.k8s.io/EndpointSlice/default/kubernetes/172.18.0.5"
      },
      "Kubernetes/networking.istio.io/ServiceEntry/default/k8s-api-ext/10.96.0.1:/10.96.0.1": {
        "address": "/10.96.0.1",
        "port": {
          "443": 443
        },
        "service": "default/kubernetes.default.svc.cluster.local",
        "workloadUid": "Kubernetes/networking.istio.io/ServiceEntry/default/k8s-api-ext/10.96.0.1"
      }
    },
    "hostname": "kubernetes.default.svc.cluster.local",
    "name": "kubernetes",
    "namespace": "default",
    "ports": {
      "443": 6443
    },
    "subjectAltNames": [],
    "vips": [
      "/10.96.0.1"
    ]
  },

Note this actually works -- it will load balance between directly hitting the API server endpoint (172.18.0.5) and the service (10.96.0.1, kube-proxy will translate it for us). Definitely wonky though.

Opening this issue to track as I cannot yet reproduce it. Trying on AWS next

howardjohn avatar Jun 28 '24 21:06 howardjohn

Works for me on EKS as well

howardjohn avatar Jun 28 '24 22:06 howardjohn

For additional context;

I went looking for why my kubernetes api wouldn't work when I was running 1.22.1 and ambient, found an entry on the issues here about how a service entry helped and used it - and interestingly it at that point enabled connectivity. (Or at least gave the impression it helped, could have been completely coincidental).

Fast forward to the 1.22.2 alpha testing and I was testing on Kind and EKS without a service entry for the kubernetes api.

When 1.22.2 dropped and my staging and production clusters got promoted suddenly no pods could talk to the kubernetes API. Spent a few hours trying to debug it and compare it to my EKS 1.22.2 alpha test cluster and the only config drift was these service entries.

Removed them and pods finally went healthy. As to why I don't know, I'm happy to put them back in and see if it breaks or adversely affects pod health.

danielloader avatar Jun 29 '24 10:06 danielloader

Thanks for the details! I only tested purely with 1.22.2 and didn't have issues. I could try upgrading specifically.

I do know that an Istiod 1.22.1 with ztunnel 1.22.2 would have API server connectivity issues

howardjohn avatar Jun 29 '24 13:06 howardjohn

I'll re apply them Monday and give some feedback

danielloader avatar Jun 29 '24 13:06 danielloader

Think this is a non issue now that successive versions have come out. Happy to leave it open if it's something wonky that needs addressing but it's no longer impacting me as far as I can see.

danielloader avatar Aug 08 '24 20:08 danielloader

lets close this off, if someone sees it again its a different issue (or you are running an old version.. upgrade!). Thanks for the help here

howardjohn avatar Aug 08 '24 20:08 howardjohn