amazon-vpc-resource-controller-k8s
amazon-vpc-resource-controller-k8s copied to clipboard
Unable to create static mirror pods due to `mpod.vpc.k8s.aws` Admission Webhook
Describe the Bug:
I am trying to create a static mirror pod on a node that is running AL2 and is connecting to an EKS control plane. When I point the kubelet to the staticPodPath
, I get the following error message in the kubelet on startup
Dec 30 02:44:48 ip-192-168-81-58.us-west-2.compute.internal kubelet[1495]: E1230 02:44:48.535524 1495 kubelet.go:1899] "Failed creating a mirror pod for" err="admission webhook \"mpod.vpc.k8s.aws\" denied the request: Failed to get Matching SGP for Pods, rejecting event" pod="default/static-web-ip-192-168-81-58.us-west-2.compute.internal"
Digging deeper into why this happened, I see that this error log gets fired here: https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/webhooks/core/pod_webhook.go#L188. Looking at the GetMatchingSecurityGroupForPods() function, I can see that this will error out and cause denial in the webhook when the webhook is unable to find the service account for the pod. Since the service account for the pod doesn't exist for static pods, I'm suspecting that the lack of the ability for looking up the unspecified service account here is causing failure on pod creation.
From reading through this issue, static pods implicitly don't rely on any API objects since they can't assume that the apiserver even exists when they come up. It seems like the webhook here makes an assumption that these service account names always exist in pods, which seems to be true almost all of the time, except in the case of static pods.
Expected Behavior:
Static pods should be able to create an apiserver representation of themselves without any failure.
How to reproduce it (as minimally and precisely as possible):
- Create an EC2 instance running the EKS-optimized AMI on AL2
- Use the following userData (or similar) when creating the instance
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"
--//
Content-Type: text/x-shellscript; charset="us-ascii"
mkdir -p /etc/kubernetes/manifests/
echo "$(jq '.staticPodPath="/etc/kubernetes/manifests/"' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json
cat <<EOF >/etc/kubernetes/manifests/static-web.yaml
apiVersion: v1
kind: Pod
metadata:
name: static-web
namespace: default
spec:
containers:
- name: web
image: nginx
EOF
--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash -xe
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
/etc/eks/bootstrap.sh <cluster-name> --apiserver-endpoint <apiserver-endpoint> --b64-cluster-ca <cluster-ca> \
--dns-cluster-ip '10.100.0.10' \
--use-max-pods false
--//--
- Wait for the node to join and the instance to start. Then, run
journalctl -u kubelet
after SSM-ing into the node to see failures creating the static pods.
Additional Context:
As a workaround right now, I'm just having to disable the mutating webhook with kubectl delete mutatingwebhookconfiguration vpc-resource-mutating-webhook
to unblock me from creating static pods.
Environment:
- Kubernetes version (use
kubectl version
):v1.28.4-eks-8cb36c9
- CNI Version:
v1.12.5-eksbuild.2
- OS (Linux/Windows): Linux
@jonathan-innis , yes, the webhook is assuming all pods are not static pods and should be assigned with a Service Account. As discussed offline, so far we are not seeing an use case which need create static pods in production. We will investigate if SA check is ignore-able from supported feature point of view (Security Group for Pods), and/or in more general point of view. I will update later.
so far we are not seeing an use case which need create static pods in production
We've seen this ask for Karpenter with Airflow: https://github.com/kubernetes-sigs/karpenter/issues/863. Granted, this is one data point, but it seems like some asks do exist for creating static pods that aren't control plane pods.
will investigate if SA check is ignore-able from supported feature point of view
Definitely seems like you could just ignore the get of the SA if you don't find one attached to the pod. I would imagine that you should be able to enforce Security Groups for Pods like you would with any other pod since I would expect that the network traffic would be routed to the static pod like any other pod on the cluster.
to enforce Security Groups for Pods
If this is regarding static pods to use Security Group for Pods, this is not a case we were supporting or testing. Who sets up the networking for static pods?
you could just ignore the get of the SA if you don't find one attached to the pod
At this moment I am not certain if the webhook can safely assume No SA assigned pods are guaranteed being static pods. Since the feature supports pod labels and sa labels, we have to be certain ignoring SA is ok in all cases.
Who sets up the networking for static pods
This is something I'm not 100% sure on. I'm assuming the CNI, as with every other pod component, but I'll double-check that in the community Slack. I'm working off that assumption only because there's no callout in the static pod docs that mentions otherwise.
At this moment I am not certain if the webhook can safely assume No SA assigned pods are guaranteed being static pods
I don't even think that you have to guarantee that they are static pods. From what I can understand, you can build a SecurityGroupPolicy off of selectors on either the pods or the service account. Naturally, I would assume that if a pod doesn't reference a service account (for whatever reason) a service account selector just wouldn't apply to it.
yep, nothing special (just CNI)
Confirmed that it's CNI like any other pod on the cluster: https://kubernetes.slack.com/archives/C09NXKJKA/p1704164272715389
Thanks for checking. It makes sense to me that static pods' networking are setup by the same path. I have no problem to remove the forced SA check on pods. Just want to call out this can be a behavior change although I think it is unlikely customers are relying on this check to avoid apply SGP to some of their pods.
Jump into the thread as I found the same issue recently.
Testing Environment:
- Amazon EKS 1.29 (fresh new clean cluster)
- Managed Add-ons:
- kube-proxy (all defaults,
v1.29.0-eksbuild.2
) - Amazon VPC CNI (all defaults,
v1.16.2-eksbuild.1
) - CoreDNS (all defaults,
v1.11.1-eksbuild.6
)
- kube-proxy (all defaults,
I found that if user tried to follow the guidance of Static Pod creation, it would failed unexpectedly.
Steps to reproduce the issue (execute inside EKS node with "root")
mkdir -p /etc/kubernetes/manifests/
cat <<EOF >/etc/kubernetes/manifests/static-web.yaml
apiVersion: v1
kind: Pod
metadata:
name: static-web
labels:
role: myrole
spec:
containers:
- name: web
image: nginx
ports:
- name: web
containerPort: 80
protocol: TCP
EOF
echo "$(jq '.staticPodPath="/etc/kubernetes/manifests/"' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json
systemctl restart kubelet
# journalctl -u kubelet | grep 'static-web'
Feb 08 05:18:27 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:27.838978 118710 kubelet.go:2424] "SyncLoop ADD" source="file" pods=["default/static-web-ip-192-168-101-59.ec2.internal"]
Feb 08 05:18:27 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:27.839118 118710 topology_manager.go:215] "Topology Admit Handler" podUID="85f6f142d15130b28f70dbf3308765a8" podNamespace="default" podName="static-web-ip-192-168-101-59.ec2.internal"
Feb 08 05:18:27 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:27.839303 118710 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="default/static-web-ip-192-168-101-59.ec2.internal"
Feb 08 05:18:27 ip-192-168-101-59.ec2.internal kubelet[118710]: E0208 05:18:27.872088 118710 kubelet.go:1930] "Failed creating a mirror pod for" err="admission webhook \"mpod.vpc.k8s.aws\" denied the request: Failed to get Matching SGP for Pods, rejecting event" pod="default/static-web-ip-192-168-101-59.ec2.internal"
Feb 08 05:18:27 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:27.872265 118710 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="default/static-web-ip-192-168-101-59.ec2.internal"
Feb 08 05:18:28 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:28.604175 118710 kubelet.go:2456] "SyncLoop (PLEG): event for pod" pod="default/static-web-ip-192-168-101-59.ec2.internal" event={"ID":"85f6f142d15130b28f70dbf3308765a8","Type":"ContainerStarted","Data":"18c924ab2785f8eb19ba785a780a311fea3fb32653d51bb8310d40285b9d4b92"}
Feb 08 05:18:32 ip-192-168-101-59.ec2.internal kubelet[118710]: I0208 05:18:32.616077 118710 kubelet.go:2456] "SyncLoop (PLEG): event for pod" pod="default/static-web-ip-192-168-101-59.ec2.internal" event={"ID":"85f6f142d15130b28f70dbf3308765a8","Type":"ContainerStarted","Data":"e982dd5d3c7faa4a34046dfba3411c2c69e8bea5c0f04e5b2cb1d22237172a7c"}
Feb 08 05:18:32 ip-192-168-101-59.ec2.internal kubelet[118710]: E0208 05:18:32.621184 118710 kubelet.go:1930] "Failed creating a mirror pod for" err="admission webhook \"mpod.vpc.k8s.aws\" denied the request: Failed to get Matching SGP for Pods, rejecting event" pod="default/static-web-ip-192-168-101-59.ec2.internal"
Feb 08 05:18:33 ip-192-168-101-59.ec2.internal kubelet[118710]: E0208 05:18:33.623274 118710 kubelet.go:1930] "Failed creating a mirror pod for" err="admission webhook \"mpod.vpc.k8s.aws\" denied the request: Failed to get Matching SGP for Pods, rejecting event" pod="default/static-web-ip-192-168-101-59.ec2.internal"