aws-app-mesh-controller-for-k8s
aws-app-mesh-controller-for-k8s copied to clipboard
Network access from initContainer of appmesh-enabled EKS Fargate Pod
Summary
I schedule a Pod on EKS Fargate which uses initContainer to wait for some external condition. Since the injected Envoy is not yet started during PodInitializing, it seems that the initContainer can not check the external condition. All traffic is blocked because Envoy is not there to forward it. This includes connections to the k8s api-server as well.
Steps to Reproduce
You can reproduce it with a call to the Kubernetes API from an initContainer.
initContainers:
- name: init
image: bitnami/kubectl:latest
args: ["wait", "--for=condition=complete", "--timeout=60s", "job/something"]
Are you currently working around this issue?
I tried to work around this by adding a appmesh.k8s.aws/egressIgnoredIPs annotation to my Pod spec in the Deployment, but it is overwritten with injected values from the controller. So, I'm out of ideas.
Looks like this issue had fallen under the radar. I've moved it to the controller repo where it probably fits better.
Hi, have few questions
All traffic is blocked because Envoy is not there to forward it.
How did you verify this ? Did you try running those external conditions after you exec into the init container ? what errors do you see if any ?
The kubectl container output is:
The connection to the server 10.100.0.1:443 was refused - did you specify the right host or port?
If I exec into the init container, and try to curl it:
$ curl -v https://$KUBERNETES_SERVICE_HOST
...
curl: (7) Failed to connect to 10.100.0.1 port 443: Connection refused
I tried to repro the issue on a Non Fargate pod but couldn't . I will try to do it using Fargate. I don't think Envoy has any role to play here, Will try reproducing it using Fargate.
I also tried to access the API server endpoint in my private subnet from the initContainer via curl -v https://XXXXXXXX.sk1.eu-central-1.eks.amazonaws.com which resolves to a non-cluster IP – Same result.
My guess would be, it is related to how iptables rules for the Pods network namespace are applied on Fargate. But since I can't get NET_ADMIN capabilities on Fargate pods, I can not inspect the rules during PodInitializing state.
Yes, your guess is right. But need to do validate that theory with Fargate pod. Let me do that and I will update this thread
I was able to repro the issue and couldn't inspect iptable rules due to NET_ADMIN capability limitation on Fargate pod. But this problem is limited to Fargate pods due to difference in the way the iptable rules are set. Thanks to @M00nF1sh for below illustration On EC2
initContainer:
customerInitContainer
curl -v https://$KUBERNETES_SERVICE_HOST -> works as there is no traffic direction to envoy at this point
initContainerEnvoyRule
setup iptables to redirect traffic to envoy
container:
app
envoy
For Fargate
CNI:
setup iptables to redirect traffic to envoy
initContainer:
customerInitContainer:
curl -v https://$KUBERNETES_SERVICE_HOST -> Fails as traffic is redirected to envoy but envoy container is not running yet
container:
app
envoy
Unlike EC2 we cannot have these iptable rule setup as part of envoy Init container as it requires containers to have NET_ADMIN capability. So we do this using CNI. I will work with Fargate team to find a resolution to this problem and let you know. But this is the reason on why you are facing the issue.
@cgchinmay I'm running into a similar issue with my initcontainer. Is there any update on this?
Hi @JKrehling, we are working with Fargate team, will have an update soon
running into this as well on ecs fargate
I do have the same issue. Any news on this item? This is pending now for quite some time.
For folks that are running into this issue there is a workaround available if it's ok for your particular case. The iptables rules that AppMesh configures on pod startup include an exception by user ID and group ID. That exception keeps traffic leaving the Envoy sidecar from being directed back to the Envoy sidecar again.
If you run your init container as user 1337, group 133 then the iptable rules will not be applied to that traffic. When configured like this calls out of the init container should work.
Here's an example init container spec with this configuration:
initContainers:
- name: al2
image: public.ecr.aws/amazonlinux/amazonlinux:2
command: ["/bin/sh"]
args: ["-c", "curl http://amazon.com/"]
securityContext:
runAsUser: 1337
runAsGroup: 133
The code that configures iptables rules for Fargate is available at https://github.com/aws/amazon-vpc-cni-plugins/blob/master/plugins/aws-appmesh/plugin/commands.go#L151 in case you're like to have a look at what the rules are.
@joesbigidea, do you refer to this config? https://github.com/aws/amazon-vpc-cni-plugins/blob/master/plugins/aws-appmesh/aws-appmesh.conf#L4
@visit1985 Yes that config is where the values for ignored user and group come from.