pulumi-eks
pulumi-eks copied to clipboard
eksClusterIngressRule stuck in delete-recreate cycle
What happened?
We pass a custom clusterSecurityGroup to our cluster.
const clusterSecurityGroup = new aws.ec2.SecurityGroup("cluster-sg", {
vpcId: vpc.vpcId,
tags: {
Name: 'cluster-sg',
},
ingress: [.... ],
egress: [... ],
});
export const cluster = new eks.Cluster("eks-cluster", {
...
clusterSecurityGroup: clusterSecurityGroup,
....
});
The following cycle is happening on refresh-update:
up:The eksClusterIngressRule gets created and associated with our security grouprefresh: The additional ingress rule is noted as an update tocluster-sg, since it's not part of our original ingress rules
aws:ec2:SecurityGroup cluster-sg updated (0.18s) [diff: ~ingress]
up: Pulumi deletes the additional ingress rule since it's not on our inputs to thecluster-sgSecurityGroup:
aws:ec2:SecurityGroup cluster-sg update [diff: ~ingress]
...
~ ingress: [
- [2]: {
- cidrBlocks : []
- description : "Allow pods to communicate with the cluster API Server"
- fromPort : 443
- ipv6CidrBlocks: []
- prefixListIds : []
- protocol : "tcp"
- securityGroups: [
- [0]: <OUR SG ID>
]
- self : false
- toPort : 443
}
]
refresh: The state ofeksClusterIngressRuleis changed to deleted
aws:ec2:SecurityGroupRule eks-cluster-eksClusterIngressRule delete
And so on...basically the ingress rules of our cluster security group are fighting with the additional ingress rules added by Pulumi. Is there a solution to this?
Example
see above
Output of pulumi about
Version 3.101.1
Go Version go1.21.6
Go Compiler gc
Plugins
NAME VERSION
aws 6.13.2
awsx 2.3.0
docker 4.5.0
docker 3.6.1
eks 2.0.0
kubernetes 4.5.5
mongodbatlas 3.11.2
nodejs unknown
tls 5.0.0
twingate 0.0.48
Host
OS darwin
Version 14.1
Arch arm64
Dependencies:
NAME VERSION
@pulumi/awsx 2.3.0
@pulumi/kubernetes 4.5.5
@pulumi/mongodbatlas 3.11.2
@twingate-labs/pulumi-twingate 0.0.48
ip 1.1.8
typescript 4.9.5
zod 3.22.4
@types/node 16.18.61
@pulumi/aws 6.13.2
@pulumi/docker 4.5.0
@pulumi/eks 2.0.0
@pulumi/pulumi 3.92.0
@pulumi/tls 5.0.0
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
Hi @jtmarmon have you tried adding an ignoreChanges property on the cluster to exclude the rules added by the security group?
@mjeffryes I think the way to do this would be to add ignoreChanges to the ingress input on the SecurityGroup, but that would mean future changes to the ingress rules would be ignored. Is there another way to do it?
@jtmarmon you should be able specifically ignore just the ingress rule that's added by the refresh by specifying the index eg. in your example above, the rule was at index 2 so you'd add ignoreChanges: ["ingress[2]"] (See https://www.pulumi.com/docs/concepts/options/ignorechanges/ for full syntax for ignoring sub-fields, list elements, etc.)
Realized I should elaborate here and say that clearly this is something that should just work and we'd like to fix it in the provider. Just trying to see if we can get you a passable workaround until then!
The issue here is that SecurityGroup with inline rules is mutually exclusive with SecurityGroupRule. See the Terraform documentation which warns against using inline rules in conjunction with rule resources.
Please update your program accordingly, and sorry for the confusion. Specifically, don't use inline rules in the security group that you pass to eks.Cluster. Meanwhile we'll add a warning for this situation.
Repro:
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
// Create a new security group for allowing SSH access
const clusterSecurityGroup = new aws.ec2.SecurityGroup("clusterSecurityGroup", {
description: "Allow SSH inbound traffic",
ingress: [
{
protocol: "tcp",
fromPort: 22,
toPort: 22,
cidrBlocks: ["0.0.0.0/0"],
},
],
});
// Create a security group rule for allowing outgoing traffic on all ports
const allTrafficSecurityGroupRule = new aws.ec2.SecurityGroupRule("eksClusterIngressRule", {
description: "Allow pods to communicate with the cluster API Server",
type: "ingress",
fromPort: 443,
toPort: 443,
protocol: "tcp",
cidrBlocks: ["0.0.0.0/0"],
securityGroupId: clusterSecurityGroup.id,
},);
// Export the Security Group ID
export const securityGroupId = clusterSecurityGroup.id;
The next version of the pulumi-aws provider will emit a warning when the program uses inline rules. The recommendation is to migrate the inline rules to aws.ec2.SecurityGroupRule resource(s).
Here's a step-by-step guide to migrate the ingress rule(s). Please follow a similar procedure for the egress rule(s).
- Remove the
ingressblock entirely from theaws.ec2.SecurityGroup. Note that removing the block entirely causes the rules to be orphaned, not removed. - For each inline ingress rule, define an equivalent
aws.ec2.SecurityGroupRuleresource(s). - Import each rule by setting the
importoption using the naming pattern that is described here. - Run
pulumi up. - Remove the
importoption.
For example, here's the repro program after migration:
import * as pulumi from "@pulumi/pulumi";
import * as aws from "@pulumi/aws";
// Create a new security group for allowing SSH access
const clusterSecurityGroup = new aws.ec2.SecurityGroup("clusterSecurityGroup", {
description: "Allow SSH inbound traffic",
});
// Create a security group rule for allowing outgoing traffic on all ports
const sshSecurityGroupRule = new aws.ec2.SecurityGroupRule("sshIngressRule", {
description: "",
type: "ingress",
fromPort: 22,
toPort: 22,
protocol: "tcp",
cidrBlocks: ["0.0.0.0/0"],
securityGroupId: clusterSecurityGroup.id,
}, {import: "sg-0bc2d20bdd94bcc13_ingress_tcp_22_22_0.0.0.0/0"});
const allTrafficSecurityGroupRule = new aws.ec2.SecurityGroupRule("eksClusterIngressRule", {
description: "Allow pods to communicate with the cluster API Server",
type: "ingress",
fromPort: 443,
toPort: 443,
protocol: "tcp",
cidrBlocks: ["0.0.0.0/0"],
securityGroupId: clusterSecurityGroup.id,
},);
// Export the Security Group ID
export const securityGroupId = clusterSecurityGroup.id;
Reopening the issue because the first solution - emit a warning - had some pushback in field testing. For example, one of the Pulumi templates triggers the warning: https://github.com/pulumi/templates/blob/98c8a657e2b292343f6dadbbc9fefdc1646fad5d/vm-aws-typescript/index.ts#L57-L74
Related: https://github.com/pulumi/pulumi-aws/issues/3788
The SecurityGroup resource with inline rules is mutually exclusive with the SecurityGroupRule resource. Since Cluster uses SecurityGroupRule, the fix for this issue is to not use inline rules for cluster-sg.
I suggest you adjust your program to use non-inline rules. Extract each one of the inline rules from cluster-sg into a standalone rule resource, and use the import option to transfer ownership without disruption, as shown above. Observe the important detail that the SecurityGroup doesn't have an ingress property set.
I should mention that inline egress rules should be similarly refactored.
We've opened a separate ticket for the suggested improvement that Pulumi emit a warning: https://github.com/pulumi/pulumi-aws/issues/3788