SE may break api-server connectivity in ambient
In discussions with @danielloader, it sounds like there may be issues with ambient api-server connectivity in some scenarios.
My repro attempt on kind:
apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
name: k8s-api-ext
spec:
hosts: [kubernetes.default.svc.cluster.local]
addresses: [10.96.0.1]
endpoints:
- address: 10.96.0.1
location: MESH_EXTERNAL
resolution: STATIC
ports:
- number: 443
name: https-k8s
protocol: HTTPS
Internal representation in ztunnel:
"/10.96.0.1": {
"endpoints": {
"Kubernetes/discovery.k8s.io/EndpointSlice/default/kubernetes/172.18.0.5:/172.18.0.5": {
"address": "/172.18.0.5",
"port": {
"443": 6443
},
"service": "default/kubernetes.default.svc.cluster.local",
"workloadUid": "Kubernetes/discovery.k8s.io/EndpointSlice/default/kubernetes/172.18.0.5"
},
"Kubernetes/networking.istio.io/ServiceEntry/default/k8s-api-ext/10.96.0.1:/10.96.0.1": {
"address": "/10.96.0.1",
"port": {
"443": 443
},
"service": "default/kubernetes.default.svc.cluster.local",
"workloadUid": "Kubernetes/networking.istio.io/ServiceEntry/default/k8s-api-ext/10.96.0.1"
}
},
"hostname": "kubernetes.default.svc.cluster.local",
"name": "kubernetes",
"namespace": "default",
"ports": {
"443": 6443
},
"subjectAltNames": [],
"vips": [
"/10.96.0.1"
]
},
Note this actually works -- it will load balance between directly hitting the API server endpoint (172.18.0.5) and the service (10.96.0.1, kube-proxy will translate it for us). Definitely wonky though.
Opening this issue to track as I cannot yet reproduce it. Trying on AWS next
Works for me on EKS as well
For additional context;
I went looking for why my kubernetes api wouldn't work when I was running 1.22.1 and ambient, found an entry on the issues here about how a service entry helped and used it - and interestingly it at that point enabled connectivity. (Or at least gave the impression it helped, could have been completely coincidental).
Fast forward to the 1.22.2 alpha testing and I was testing on Kind and EKS without a service entry for the kubernetes api.
When 1.22.2 dropped and my staging and production clusters got promoted suddenly no pods could talk to the kubernetes API. Spent a few hours trying to debug it and compare it to my EKS 1.22.2 alpha test cluster and the only config drift was these service entries.
Removed them and pods finally went healthy. As to why I don't know, I'm happy to put them back in and see if it breaks or adversely affects pod health.
Thanks for the details! I only tested purely with 1.22.2 and didn't have issues. I could try upgrading specifically.
I do know that an Istiod 1.22.1 with ztunnel 1.22.2 would have API server connectivity issues
I'll re apply them Monday and give some feedback
Think this is a non issue now that successive versions have come out. Happy to leave it open if it's something wonky that needs addressing but it's no longer impacting me as far as I can see.
lets close this off, if someone sees it again its a different issue (or you are running an old version.. upgrade!). Thanks for the help here