pulumi-eks icon indicating copy to clipboard operation
pulumi-eks copied to clipboard

eksClusterIngressRule stuck in delete-recreate cycle

Open jtmarmon opened this issue 1 year ago • 4 comments

What happened?

We pass a custom clusterSecurityGroup to our cluster.

const clusterSecurityGroup = new aws.ec2.SecurityGroup("cluster-sg", {
  vpcId: vpc.vpcId,
  tags: {
    Name: 'cluster-sg',
  },
  ingress: [....  ],
  egress: [... ],
});
export const cluster = new eks.Cluster("eks-cluster", {
   ...
   clusterSecurityGroup: clusterSecurityGroup,
   ....
});

The following cycle is happening on refresh-update:

  • up:The eksClusterIngressRule gets created and associated with our security group
  • refresh: The additional ingress rule is noted as an update to cluster-sg, since it's not part of our original ingress rules
aws:ec2:SecurityGroup                          cluster-sg                                     updated (0.18s)     [diff: ~ingress]
  • up: Pulumi deletes the additional ingress rule since it's not on our inputs to the cluster-sg SecurityGroup:
aws:ec2:SecurityGroup    cluster-sg          update     [diff: ~ingress]
...
      ~ ingress: [
          - [2]: {
                  - cidrBlocks    : []
                  - description   : "Allow pods to communicate with the cluster API Server"
                  - fromPort      : 443
                  - ipv6CidrBlocks: []
                  - prefixListIds : []
                  - protocol      : "tcp"
                  - securityGroups: [
                  -     [0]: <OUR SG ID>
                    ]
                  - self          : false
                  - toPort        : 443
                }
        ]
  • refresh: The state of eksClusterIngressRule is changed to deleted
aws:ec2:SecurityGroupRule                   eks-cluster-eksClusterIngressRule              delete

And so on...basically the ingress rules of our cluster security group are fighting with the additional ingress rules added by Pulumi. Is there a solution to this?

Example

see above

Output of pulumi about

Version      3.101.1
Go Version   go1.21.6
Go Compiler  gc

Plugins
NAME          VERSION
aws           6.13.2
awsx          2.3.0
docker        4.5.0
docker        3.6.1
eks           2.0.0
kubernetes    4.5.5
mongodbatlas  3.11.2
nodejs        unknown
tls           5.0.0
twingate      0.0.48

Host
OS       darwin
Version  14.1
Arch     arm64

Dependencies:
NAME                            VERSION
@pulumi/awsx                    2.3.0
@pulumi/kubernetes              4.5.5
@pulumi/mongodbatlas            3.11.2
@twingate-labs/pulumi-twingate  0.0.48
ip                              1.1.8
typescript                      4.9.5
zod                             3.22.4
@types/node                     16.18.61
@pulumi/aws                     6.13.2
@pulumi/docker                  4.5.0
@pulumi/eks                     2.0.0
@pulumi/pulumi                  3.92.0
@pulumi/tls                     5.0.0

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

jtmarmon avatar Jan 31 '24 17:01 jtmarmon

Hi @jtmarmon have you tried adding an ignoreChanges property on the cluster to exclude the rules added by the security group?

mjeffryes avatar Feb 01 '24 01:02 mjeffryes

@mjeffryes I think the way to do this would be to add ignoreChanges to the ingress input on the SecurityGroup, but that would mean future changes to the ingress rules would be ignored. Is there another way to do it?

jtmarmon avatar Feb 01 '24 02:02 jtmarmon

@jtmarmon you should be able specifically ignore just the ingress rule that's added by the refresh by specifying the index eg. in your example above, the rule was at index 2 so you'd add ignoreChanges: ["ingress[2]"] (See https://www.pulumi.com/docs/concepts/options/ignorechanges/ for full syntax for ignoring sub-fields, list elements, etc.)

mjeffryes avatar Feb 27 '24 00:02 mjeffryes

Realized I should elaborate here and say that clearly this is something that should just work and we'd like to fix it in the provider. Just trying to see if we can get you a passable workaround until then!

mjeffryes avatar Feb 27 '24 16:02 mjeffryes

The issue here is that SecurityGroup with inline rules is mutually exclusive with SecurityGroupRule. See the Terraform documentation which warns against using inline rules in conjunction with rule resources.

Please update your program accordingly, and sorry for the confusion. Specifically, don't use inline rules in the security group that you pass to eks.Cluster. Meanwhile we'll add a warning for this situation.

Repro:

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

// Create a new security group for allowing SSH access
const clusterSecurityGroup = new aws.ec2.SecurityGroup("clusterSecurityGroup", {
    description: "Allow SSH inbound traffic",
    ingress: [
        {
            protocol: "tcp",
            fromPort: 22,
            toPort: 22,
            cidrBlocks: ["0.0.0.0/0"],
        },
    ],
});

// Create a security group rule for allowing outgoing traffic on all ports
const allTrafficSecurityGroupRule = new aws.ec2.SecurityGroupRule("eksClusterIngressRule", {
    description: "Allow pods to communicate with the cluster API Server",
    type: "ingress",
    fromPort: 443,
    toPort: 443,
    protocol: "tcp",
    cidrBlocks: ["0.0.0.0/0"],
    securityGroupId: clusterSecurityGroup.id,
},);

// Export the Security Group ID
export const securityGroupId = clusterSecurityGroup.id;

EronWright avatar Mar 25 '24 18:03 EronWright

The next version of the pulumi-aws provider will emit a warning when the program uses inline rules. The recommendation is to migrate the inline rules to aws.ec2.SecurityGroupRule resource(s).

Here's a step-by-step guide to migrate the ingress rule(s). Please follow a similar procedure for the egress rule(s).

  1. Remove the ingress block entirely from the aws.ec2.SecurityGroup. Note that removing the block entirely causes the rules to be orphaned, not removed.
  2. For each inline ingress rule, define an equivalent aws.ec2.SecurityGroupRule resource(s).
  3. Import each rule by setting the import option using the naming pattern that is described here.
  4. Run pulumi up.
  5. Remove the import option.

For example, here's the repro program after migration:

import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";

// Create a new security group for allowing SSH access
const clusterSecurityGroup = new aws.ec2.SecurityGroup("clusterSecurityGroup", {
    description: "Allow SSH inbound traffic",
});

// Create a security group rule for allowing outgoing traffic on all ports
const sshSecurityGroupRule = new aws.ec2.SecurityGroupRule("sshIngressRule", {
    description: "",
    type: "ingress",
    fromPort: 22,
    toPort: 22,
    protocol: "tcp",
    cidrBlocks: ["0.0.0.0/0"],
    securityGroupId: clusterSecurityGroup.id,
}, {import: "sg-0bc2d20bdd94bcc13_ingress_tcp_22_22_0.0.0.0/0"});

const allTrafficSecurityGroupRule = new aws.ec2.SecurityGroupRule("eksClusterIngressRule", {
    description: "Allow pods to communicate with the cluster API Server",
    type: "ingress",
    fromPort: 443,
    toPort: 443,
    protocol: "tcp",
    cidrBlocks: ["0.0.0.0/0"],
    securityGroupId: clusterSecurityGroup.id,
},);

// Export the Security Group ID
export const securityGroupId = clusterSecurityGroup.id;

EronWright avatar Mar 25 '24 22:03 EronWright

Reopening the issue because the first solution - emit a warning - had some pushback in field testing. For example, one of the Pulumi templates triggers the warning: https://github.com/pulumi/templates/blob/98c8a657e2b292343f6dadbbc9fefdc1646fad5d/vm-aws-typescript/index.ts#L57-L74

EronWright avatar Apr 08 '24 23:04 EronWright

Related: https://github.com/pulumi/pulumi-aws/issues/3788

EronWright avatar Apr 09 '24 16:04 EronWright

The SecurityGroup resource with inline rules is mutually exclusive with the SecurityGroupRule resource. Since Cluster uses SecurityGroupRule, the fix for this issue is to not use inline rules for cluster-sg.

I suggest you adjust your program to use non-inline rules. Extract each one of the inline rules from cluster-sg into a standalone rule resource, and use the import option to transfer ownership without disruption, as shown above. Observe the important detail that the SecurityGroup doesn't have an ingress property set.

I should mention that inline egress rules should be similarly refactored.

We've opened a separate ticket for the suggested improvement that Pulumi emit a warning: https://github.com/pulumi/pulumi-aws/issues/3788

EronWright avatar Apr 12 '24 20:04 EronWright