aws-app-mesh-controller-for-k8s icon indicating copy to clipboard operation
aws-app-mesh-controller-for-k8s copied to clipboard

Network access from initContainer of appmesh-enabled EKS Fargate Pod

Open visit1985 opened this issue 4 years ago • 15 comments

Summary

I schedule a Pod on EKS Fargate which uses initContainer to wait for some external condition. Since the injected Envoy is not yet started during PodInitializing, it seems that the initContainer can not check the external condition. All traffic is blocked because Envoy is not there to forward it. This includes connections to the k8s api-server as well.

Steps to Reproduce

You can reproduce it with a call to the Kubernetes API from an initContainer.

  initContainers:
    - name: init
      image: bitnami/kubectl:latest
      args: ["wait", "--for=condition=complete", "--timeout=60s", "job/something"]

Are you currently working around this issue?

I tried to work around this by adding a appmesh.k8s.aws/egressIgnoredIPs annotation to my Pod spec in the Deployment, but it is overwritten with injected values from the controller. So, I'm out of ideas.

visit1985 avatar Feb 18 '21 18:02 visit1985

Looks like this issue had fallen under the radar. I've moved it to the controller repo where it probably fits better.

lavignes avatar Mar 11 '21 01:03 lavignes

Hi, have few questions

All traffic is blocked because Envoy is not there to forward it.

How did you verify this ? Did you try running those external conditions after you exec into the init container ? what errors do you see if any ?

cgchinmay avatar Mar 12 '21 21:03 cgchinmay

The kubectl container output is:

The connection to the server 10.100.0.1:443 was refused - did you specify the right host or port?

If I exec into the init container, and try to curl it:

$ curl -v https://$KUBERNETES_SERVICE_HOST
...
curl: (7) Failed to connect to 10.100.0.1 port 443: Connection refused

visit1985 avatar Mar 17 '21 19:03 visit1985

I tried to repro the issue on a Non Fargate pod but couldn't . I will try to do it using Fargate. I don't think Envoy has any role to play here, Will try reproducing it using Fargate.

cgchinmay avatar Mar 17 '21 19:03 cgchinmay

I also tried to access the API server endpoint in my private subnet from the initContainer via curl -v https://XXXXXXXX.sk1.eu-central-1.eks.amazonaws.com which resolves to a non-cluster IP – Same result.

visit1985 avatar Mar 17 '21 22:03 visit1985

My guess would be, it is related to how iptables rules for the Pods network namespace are applied on Fargate. But since I can't get NET_ADMIN capabilities on Fargate pods, I can not inspect the rules during PodInitializing state.

visit1985 avatar Mar 18 '21 15:03 visit1985

Yes, your guess is right. But need to do validate that theory with Fargate pod. Let me do that and I will update this thread

cgchinmay avatar Mar 18 '21 17:03 cgchinmay

I was able to repro the issue and couldn't inspect iptable rules due to NET_ADMIN capability limitation on Fargate pod. But this problem is limited to Fargate pods due to difference in the way the iptable rules are set. Thanks to @M00nF1sh for below illustration On EC2

initContainer:
customerInitContainer
    curl -v https://$KUBERNETES_SERVICE_HOST -> works as there is no traffic direction to envoy at this point
initContainerEnvoyRule
    setup iptables to redirect traffic to envoy
container:
    app
    envoy

For Fargate

  CNI:
        setup iptables to redirect traffic to envoy
  initContainer:
        customerInitContainer:
            curl -v https://$KUBERNETES_SERVICE_HOST -> Fails as traffic is redirected to envoy but envoy container is not running yet
  container:
    app
    envoy

Unlike EC2 we cannot have these iptable rule setup as part of envoy Init container as it requires containers to have NET_ADMIN capability. So we do this using CNI. I will work with Fargate team to find a resolution to this problem and let you know. But this is the reason on why you are facing the issue.

cgchinmay avatar Mar 19 '21 19:03 cgchinmay

@cgchinmay I'm running into a similar issue with my initcontainer. Is there any update on this?

JKrehling avatar Sep 16 '21 01:09 JKrehling

Hi @JKrehling, we are working with Fargate team, will have an update soon

cgchinmay avatar Sep 16 '21 01:09 cgchinmay

running into this as well on ecs fargate

nwsparks avatar Aug 28 '22 13:08 nwsparks

I do have the same issue. Any news on this item? This is pending now for quite some time.

boonelschenbroich avatar Jul 05 '23 11:07 boonelschenbroich

For folks that are running into this issue there is a workaround available if it's ok for your particular case. The iptables rules that AppMesh configures on pod startup include an exception by user ID and group ID. That exception keeps traffic leaving the Envoy sidecar from being directed back to the Envoy sidecar again.

If you run your init container as user 1337, group 133 then the iptable rules will not be applied to that traffic. When configured like this calls out of the init container should work.

Here's an example init container spec with this configuration:

      initContainers:
        - name: al2
          image: public.ecr.aws/amazonlinux/amazonlinux:2
          command: ["/bin/sh"]
          args: ["-c", "curl http://amazon.com/"]
          securityContext:
            runAsUser: 1337
            runAsGroup: 133

The code that configures iptables rules for Fargate is available at https://github.com/aws/amazon-vpc-cni-plugins/blob/master/plugins/aws-appmesh/plugin/commands.go#L151 in case you're like to have a look at what the rules are.

joesbigidea avatar Jul 11 '23 14:07 joesbigidea

@joesbigidea, do you refer to this config? https://github.com/aws/amazon-vpc-cni-plugins/blob/master/plugins/aws-appmesh/aws-appmesh.conf#L4

visit1985 avatar Jul 11 '23 23:07 visit1985

@visit1985 Yes that config is where the values for ignored user and group come from.

joesbigidea avatar Jul 12 '23 14:07 joesbigidea