eksctl icon indicating copy to clipboard operation
eksctl copied to clipboard

[Bug] vpc.securityGroup validation issue while creating nodegroup

Open hans72118 opened this issue 2 years ago • 10 comments

What were you trying to accomplish?

  1. Correctly validates default AWS security egress rule for both IPv4 and IPv6.
  2. An option to allow using restricted cluster outbound rules to fulfill EKS cluster which has to meet some security policies/requirements.

https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html#security-group-restricting-cluster-traffic

Rule type | Protocol | Port | Destination
-- | -- | -- | --
Outbound | TCP | 443 | Cluster security group
Outbound | TCP | 10250 | Cluster security group
Outbound (DNS) | TCP and UDP | 53 | Cluster security group

What happened?

Related to: https://github.com/eksctl-io/eksctl/issues/6455 https://github.com/eksctl-io/eksctl/pull/7030

After eksctl version 0.157.0, security group rule seems to be validated to have default IPv4 egress rule with All Traffic and 0.0.0.0/0. Since a security group created in AWS default has IPv6 and IPv4 egress rule for ::/0 and 0.0.0.0/0 , we experienced the following error:

❯ eksctl create nodegroup -f Nodegroup.yaml --dry-run
Error: vpc.securityGroup (sg-009c6a55c3937abcd) has egress rules that were not attached by eksctl; vpc.securityGroup should not contain any non-default external egress rules on a cluster not created by eksctl (rule ID: sgr-02524e9e33210abcd)

Where the egress rules

sg-009c6a55c3937abcd - Outbound rules (2)
---------------------------------------------------------
– sgr-02524e9e33210abcd	IPv6	All traffic	All	All	::/0	–
– sgr-043a6fe0e104aabcd	IPv4	All traffic	All	All	0.0.0.0/0	–

How to reproduce it?

Use a security group with default AWS egress rule as following in vpc.securityGroup to create nodegroup.

sg-009c6a55c3937abcd - Outbound rules (2)
---------------------------------------------------------
– sgr-02524e9e33210abcd	IPv6	All traffic	All	All	::/0	–
– sgr-043a6fe0e104aabcd	IPv4	All traffic	All	All	0.0.0.0/0	–

Nodegroup.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: LAB-EKS-28
  region: ap-northeast-1
  version: "1.28"

vpc:
  id: "vpc-eaabcdef"
  cidr: "172.31.0.0/16"
  securityGroup: "sg-009c6a55c3937abcd"  ## Additional SG
  subnets:
    public:
      public1:
          id: "subnet-12abcdef"
          az: ap-northeast-1c
      public2:
          id: "subnet-a7abcdef"
          az: ap-northeast-1a
    private:
      private1:
          id: "subnet-00b83dc8b30abcdef"
          az: ap-northeast-1c
      private2:
          id: "subnet-0dd2b34ddd1abcdef"
          az: ap-northeast-1a

managedNodeGroups:
  - name: TEST-28
    instanceType: c6a.large
    desiredCapacity: 2
    minSize: 0
    maxSize: 10
    securityGroups:
      withLocal: false
    ssh:
      allow: true
      publicKeyName: Testing

Logs

Anything else we need to know?

Versions

❯ eksctl version
0.161.0

hans72118 avatar Oct 12 '23 15:10 hans72118

I'm trying to troubleshoot a squid proxy that I'm using for my Terraform-created EKS cluster. I want to try and add a nodegroup using eksctl because that's worked for me in other VPCs to get a node group that uses squid, but this validation is blocking me.

We should be able to have whatever rules we want in the outbound rules. This validation is a bit of an over-reach imo.

matthenry87 avatar Nov 01 '23 20:11 matthenry87

@matthenry87, we are planning to relax the validation but need some time to give it more thought. The team is occupied with other major deliverables at the moment. If this is a blocker for you, I'd recommend downgrading to an older version in the meantime.

cPu1 avatar Nov 02 '23 13:11 cPu1

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Dec 03 '23 01:12 github-actions[bot]

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Jan 04 '24 01:01 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Jan 09 '24 01:01 github-actions[bot]

Hello Team,

Are this improvement on this year roadmap?

yws-ss avatar Apr 30 '24 03:04 yws-ss

Please fix this, somehow it deleted my outbound rules from my SG!! Took some time to figure out why my entire Dev cluster was dead.. Had to manually re-add outbound rules for All Traffic on IPv4 and IPv6

sschamp avatar May 17 '24 15:05 sschamp

Looks like it's NOT fixed in 0.179.0 The workaround is:

  1. drop outbound IPv6 rule
  2. create node group
  3. add the rule back

ok512 avatar May 29 '24 17:05 ok512

This issue has been scoped down and only applies to self-managed nodegroups now. The long term plan might involve adding the SG rules directly via API, instead of using CFN. More context - https://github.com/eksctl-io/eksctl/issues/6455#issuecomment-1697275161

TiberiuGC avatar Jul 18 '24 09:07 TiberiuGC

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Aug 19 '24 01:08 github-actions[bot]