pulumi-awsx
pulumi-awsx copied to clipboard
`TargetGroup` sometimes does not attach to `ApplicationLoadBalancer`
What happened?
I was trying to create a single FargateService with two different TargetGroups attached to an ApplicationLoadBalancer (one tg for HTTP requests, one tg for socket connections). When deployed, one target group simply doesn't attach to the load balancer. What's even more concerning is that, when the exact same code is deployed to a second stack, it attaches just fine. I'm relatively new to Pulumi so there might be something I'm missing, but I assumed identical code should result in identical resources.
I understand this might not be reproducible, I mostly just want to flag that I'm seeing inconsistency between environments and hopefully get some answers on how this is possible
Example
Unfortunately, this is part of our private infra so I won't be able to send the entire deploy script, but I'll try to send as much relevant info as possible. Here is the code for the target groups and load balancer:
const serverTg = new aws.lb.TargetGroup(`leaves-server-tg-${stack}`, {
vpcId: defaultVpc.vpcId,
stickiness: {
type: 'lb_cookie',
},
port,
protocol: 'HTTP',
targetType: 'ip',
protocolVersion: 'HTTP1',
healthCheck: {
path: '/api',
port: 'traffic-port',
protocol: 'HTTP',
matcher: '200',
enabled: true,
interval: 60,
timeout: 30,
},
});
const socketTg = new aws.lb.TargetGroup(`leaves-socket-tg-${stack}`, {
vpcId: defaultVpc.vpcId,
port: 5001,
protocol: 'HTTP',
stickiness: {
type: 'lb_cookie',
},
targetType: 'ip',
protocolVersion: 'HTTP1',
healthCheck: {
path: '/api',
port: `${port}`,
protocol: 'HTTP',
matcher: '200',
enabled: true,
interval: 60,
timeout: 30,
},
});
const lb = new awsx.lb.ApplicationLoadBalancer(`leaves-lb-${stack}`, {
listeners: [
{
port: 443,
protocol: 'HTTPS',
certificateArn: lb_cert.arn,
defaultActions: [
{
type: 'forward',
targetGroupArn: serverTg.arn,
},
],
},
{
port: 8443,
protocol: 'HTTPS',
certificateArn: lb_cert.arn,
defaultActions: [
{
type: 'forward',
targetGroupArn: socketTg.arn,
},
],
},
],
});
And here's the code for the target service:
new awsx.ecs.FargateService(`leaves-server-service-${stack}`, {
networkConfiguration: {
assignPublicIp: true,
securityGroups: [serviceSg.id],
subnets: defaultVpc.publicSubnetIds,
},
cluster: cluster.arn,
desiredCount: 4,
taskDefinitionArgs: {
taskRole: {
roleArn: role.arn,
},
container: {
name: 'server',
image: image.imageUri,
command: ['infisical', 'run', `--env=${stack}`, '--', 'yarn', 'server'],
cpu: 2 * 1024,
memory: 4 * 1024,
environment: serverEnvironment,
essential: true,
portMappings: [
{
targetGroup: serverTg,
containerPort: port,
},
{
targetGroup: socketTg,
containerPort: 5001,
},
],
healthCheck: {
command: ['CMD-SHELL', `curl -f http://localhost:${port}/api/ || exit 1`],
interval: 30,
timeout: 5,
retries: 3,
},
},
},
});
Here are the target groups - the relevant ones are selected. Note that leaves-socket-tg-dev has no associated load balancer:
Output of pulumi about
CLI
Version 3.112.0
Go Version go1.22.1
Go Compiler gc
Plugins
NAME VERSION
aws 6.28.2
awsx 2.5.0
cloudflare 5.22.0
docker 4.5.3
docker 3.6.1
nodejs unknown
tls 5.0.1
Host
OS darwin
Version 14.4
Arch arm64
This project is written in nodejs: executable='/Users/rpmccarter/.nvm/versions/node/v20.10.0/bin/node' version='v20.10.0'
Current Stack: Mintlify/leaves/dev
TYPE URN
[removed]
Found no pending operations associated with dev
Backend
Name pulumi.com
URL https://app.pulumi.com/Mintlify
User Mintlify
Organizations Mintlify
Token type personal
Dependencies:
NAME VERSION
@pulumi/aws 6.28.2
@pulumi/awsx 2.5.0
@pulumi/cloudflare 5.22.0
@pulumi/pulumi 3.109.0
@pulumi/tls 5.0.1
@types/node 16.18.22
rimraf 5.0.5
typescript 5.3.3
Pulumi locates its logs in /var/folders/dn/z0by0dcj1gnbkjr6_t71hp_m0000gn/T/ by default
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
Thanks for reporting this @rpmccarter this sounds pretty concerning. To clarify does the failed state happen sporadically or every single time? Are there no errors reported? Does the condition not resolve after a certain time (5 min later)?
This sounds pretty concerning but will be difficult for our team to diagnose so anything along the lines of narrowing down the repro would be super helpful. If anyone is running into this please let us know also what you are observing.
Any further context you can offer to help us reproduce this @rpmccarter ?
Hey team, I'm fairly confident this is just a symptom of #1253. I'm just now running into a very similar issue with a Cloudflare Record failing to be created due to a missing field which is lb.loadBalancer.dnsName - closing this as a duplicate