aws-cdk icon indicating copy to clipboard operation
aws-cdk copied to clipboard

aws-ecs: Resource handler returned message: "Invalid request provided: The target group with targetGroupArn <redacted> does not have an associated load balancer."

Open BwL1289 opened this issue 8 months ago • 15 comments

Describe the bug

With this workaround for redirecting HTTP to HTTPS on an ALB, I indeterminately experience the following:

CoreStackDev | 28/70 | 5:02:27 PM | UPDATE_FAILED        | AWS::ECS::Service                             | WebAppSvc/FlaskAppService/ControlPlane/ApplicationLoadBalancerFargateSvc/AlbFargateSvc/Service/Service (WebAppSvcFlaskAppServiceControlPlaneApplicationLoadBalancerFargateSvcAlbFargateSvcServiceADE54F9D) Resource handler returned message: "Invalid request provided: The target group with targetGroupArn <redacted> does not have an associated load balancer. (Service: Ecs, Status Code: 400, Request ID: 91fd6146-d2f3-4534-bbdb-893469be2a1d) (SDK Attempt Count: 1)" (RequestToken: 25bc3e51-6110-4b81-8256-f4c02c60a863, HandlerErrorCode: InvalidRequest)

I tested redeploying multiple times (and changing the ECS Service, ALB, Task definition logicalids to recreate the resources) to no avail.

It looks like there's either:

  1. A race condition (regression)? somewhere. What's strange is this was working fine for a few days with multiple successful redeployments in between the original deployment and experiencing the error.
  2. Some other reason why cloudformation thinks an old ALB target group exists when it does not

After removing the listener redirection (lb.ListenerAction.redirect) the deployment succeeded again.

My ticket is related.

Regression Issue

  • [ ] Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

Successful deployment

Current Behavior

Indeterminate Invalid request provided: The target group with targetGroupArn <arn> does not have an associated load balancer.

Reproduction Steps

  1. Deploy an ALB with ecs_patterns.ApplicationLoadBalancedFargateService with protocol=lb.ApplicationProtocol.HTTP,
  2. Use this workaround to redirect HTTP to HTTPSand deploy
  3. Deploy a few times...
  4. Maybe experience error?

Possible Solution

No response

Additional Information/Context

"userAgent": "cloudformation.amazonaws.com",
    "errorCode": "InvalidParameterException",
    "errorMessage": "The target group with <redacted>/CoreSt-WebAp-OBAY5UVSDTFW/5f79f6f39969fa92 does not have an associated load balancer.",
    "requestParameters": {
        "capacityProviderStrategy": [
            {
                "capacityProvider": "FARGATE",
                "weight": 1,
                "base": 1
            }
        ],
        "cluster": "CoreStackDev-WebAppSvcFlaskAppServiceControlPlaneEcsClusterSvcEcsClusterEcsClusterSvcB291F355-xtUAqFpdBbkj",
        "deploymentConfiguration": {
            "deploymentCircuitBreaker": {
                "enable": true,
                "rollback": true
            },
            "maximumPercent": 200,
            "minimumHealthyPercent": 100,
            "alarms": {
                "rollback": false,
                "enable": false
            }
        },
        "desiredCount": 1,
        "enableECSManagedTags": true,
        "enableExecuteCommand": true,
        "availabilityZoneRebalancing": "DISABLED",
        "forceNewDeployment": false,
        "healthCheckGracePeriodSeconds": 60,
        "loadBalancers": [
            {
                "targetGroupArn": "<redacted>/CoreSt-WebAp-OBAY5UVSDTFW/5f79f6f39969fa92",
                "containerName": "EcsTaskDefContainerSvcEcsContainerDef",
                "containerPort": 80
            },
            {
                "targetGroupArn": "<redacted>/CoreSt-WebAp-XB77YZBVGDH8/559a5660e756f528",
                "containerName": "EcsTaskDefContainerSvcEcsContainerDef",
                "containerPort": 80
            }
        ],

CDK CLI Version

2.180.0

Framework Version

No response

Node.js Version

v22.12.0

OS

Mac

Language

Python

Language Version

No response

Other information

No response

BwL1289 avatar Apr 28 '25 23:04 BwL1289

Hey @BwL1289, thank you for reaching out.

After examining the AWS CDK codebase, the following could be cause of the intermittent The target group with targetGroupArn does not have an associated load balancer error when using HTTP to HTTPS redirection.

Code Analysis:

  1. ECS Service Dependency on Target Group In packages/aws-cdk-lib/aws-ecs/lib/base/base-service.ts, the ECS service creates a dependency on the target group's load balancer attachment:
private attachToELBv2(targetGroup: elbv2.ITargetGroup, containerName: string, containerPort: number): elbv2.LoadBalancerTargetProps {
  // ...
  this.loadBalancers.push({
    targetGroupArn: targetGroup.targetGroupArn,
    containerName,
    containerPort,
  });

  // Service creation can only happen after the load balancer has
  // been associated with our target group(s), so add ordering dependency.
  this.resource.node.addDependency(targetGroup.loadBalancerAttached);
  
  // ...
}

This line this.resource.node.addDependency(targetGroup.loadBalancerAttached) creates the dependency that should ensure the target group is associated with a load balancer before the service is created.

  1. Target Group Load Balancer Attachment Tracking In packages/aws-cdk-lib/aws-elasticloadbalancingv2/lib/shared/base-target-group.ts, the load balancer attachment dependencies are managed:
/**
 * Configurable dependable with all resources that lead to load balancer attachment
 */
protected readonly loadBalancerAttachedDependencies = new DependencyGroup();

/**
 * List of constructs that need to be depended on to ensure the TargetGroup is associated to a load balancer
 */
public get loadBalancerAttached(): IDependable {
  return this.loadBalancerAttachedDependencies;
}
  1. Listener Registration in Target Group In packages/aws-cdk-lib/aws-elasticloadbalancingv2/lib/alb/application-target-group.ts, when a listener is added:
/**
 * Register a listener that is load balancing to this target group.
 *
 * Don't call this directly. It will be called by listeners.
 */
public registerListener(listener: IApplicationListener, associatingConstruct?: IConstruct): void {
  // ...
  this.listeners.push(listener);
  this.loadBalancerAttachedDependencies.add(associatingConstruct ?? listener);
}

The issue likely occurs because:

When using the HTTP to HTTPS redirection workaround, you're modifying or replacing the action on an existing listener after it's been created and registered with the target group.

You create a service with ApplicationLoadBalancedFargateService that sets up HTTP listener You then modify the HTTP listener to redirect to HTTPS You add an HTTPS listener with a forward action to the same target group The following shows two target groups being registered with the ECS service:

"loadBalancers": [
  { "targetGroupArn": ".../CoreSt-WebAp-OBAY5UVSDTFW/5f79f6f39969fa92" },
  { "targetGroupArn": ".../CoreSt-WebAp-XB77YZBVGDH8/559a5660e756f528" }
]

This suggests that when modifying listeners and actions, the dependency tracking for target group associations may be causing the issue.

Tried reproducing the issue using the code provided in https://github.com/aws/aws-cdk/issues/5583#issuecomment-2825719363 but was not able to observe this error message.

Will bring this up internally for any additional feedback but do provide us any additional information that could help us dive into this issue.

ykethan avatar Apr 29 '25 17:04 ykethan

@ykethan I agree with your analysis. I get a warning during deployment / synth about the listener modification.

As indicated by the other GH issues I've created on the topic, CDK doesn't natively support this use case without the workaround above, so I'm not sure how to mitigate the error.

For context, I need this to comply with the SOC2 security standard, so appreciate support from AWS to either implement native support or find another workaround to this current workaround.

BwL1289 avatar Apr 29 '25 17:04 BwL1289

@BwL1289 thank you patience while we dive into this issue. I was successfully able to reproduce the "The target group with targetGroupArn does not have an associated load balancer" error. Here's a detailed reproduction and explanation of the issue:

Reproduction Code:

// Create HTTP listener with target group first
const httpListener = loadBalancer.addListener('HttpListener', {
  port: 80,
  open: true,
  defaultTargetGroups: [targetGroup], // Associate target group with listener
});

// Create HTTPS listener with the same target group
const httpsListener = loadBalancer.addListener('HttpsListener', {
  port: 443,
  certificates: [certificate],
  open: true,
  defaultTargetGroups: [targetGroup],
});

// Create Fargate service
const fargateService = new ecs.FargateService(this, 'Service', {
  cluster,
  taskDefinition,
  desiredCount: 1,
  assignPublicIp: false,
});

// Register service with target group
targetGroup.addTarget(fargateService);

// PROBLEM TRIGGER: Modifying HTTP listener to redirect AFTER the target group 
// has been associated and AFTER the service has been registered
httpListener.addAction('HttpRedirect', {
  action: lb.ListenerAction.redirect({
    port: '443',
    protocol: 'HTTPS',
    permanent: true,
  }),
});

During deployment, we get this warning:

[Warning at /MyProjectStack/ALB/HttpListener] A default Action already existed on this Listener and was replaced.`

And we get the error:

Resource handler returned message: "Invalid request provided: The target group with targetGroupArn does not have an associated load balancer. (Service: Ecs, Status Code: 400)"

Console testing:

  1. Create a ALB with 2 listeners HTTP and HTTPS. Added them to the same target group.
  2. Once ALB is created. Select the HTTP:80 listener(checkbox)
  3. click, Manage Listener -> Edit listener

Image 4. save changes.

No errors observed on console

Root Cause The issue occurs due to the sequence of operations:

  1. We create a target group and associate it with an HTTP listener
  2. When we later modify the HTTP listener to redirect to HTTPS, we're replacing its default action
  3. This disrupts the dependency tracking between the target group and load balancer in CDK
  4. When deploying, CloudFormation's ordering of operations can sometimes cause the ECS service to be created/updated before the target group is fully associated with a load balancer

The warning about replacing a default action does appear to be the key indicator for this issue. But similar to https://github.com/aws/aws-cdk/issues/34235 modifying listeners after creation, appears to be causing this error. Marking as P1 Bug

ykethan avatar Apr 30 '25 16:04 ykethan

@ykethan thank you.

BwL1289 avatar Apr 30 '25 16:04 BwL1289

Is there an update on this? We are blocked from completing SOC-2 compliance.

BwL1289 avatar May 14 '25 13:05 BwL1289

I tried reproducing the issue multiple times with this stack but was not able to:

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as elbv2 from 'aws-cdk-lib/aws-elasticloadbalancingv2';
import * as acm from 'aws-cdk-lib/aws-certificatemanager';

export class BugReproductionStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Create VPC
    const vpc = new ec2.Vpc(this, 'MyVpc', {
      maxAzs: 2,
    });

    // Create ECS cluster
    const cluster = new ecs.Cluster(this, 'Cluster', {
      vpc,
    });

    // Create task definition
    const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDef');
    
    const container = taskDefinition.addContainer('web', {
      image: ecs.ContainerImage.fromRegistry('amazon/amazon-ecs-sample'),
      memoryLimitMiB: 512,
      cpu: 256,
    });
    
    container.addPortMappings({
      containerPort: 80,
    });

    // Create load balancer
    const loadBalancer = new elbv2.ApplicationLoadBalancer(this, 'ALB', {
      vpc,
      internetFacing: true,
    });

    // Create a self-signed certificate for HTTPS
    const certificate = acm.Certificate.fromCertificateArn(this, "Certificate2", "<cert arn>")
    // Create target group
    const targetGroup = new elbv2.ApplicationTargetGroup(this, 'TargetGroup', {
      vpc,
      port: 80,
      protocol: elbv2.ApplicationProtocol.HTTP,
      targetType: elbv2.TargetType.IP,
      healthCheck: {
        path: '/',
      },
    });

    // Create HTTP listener with target group first
    const httpListener = loadBalancer.addListener('HttpListener', {
      port: 80,
      open: true,
      defaultTargetGroups: [targetGroup], // Associate target group with listener
    });

// Create HTTPS listener with the same target group
 loadBalancer.addListener('HttpsListener', {
  port: 443,
  certificates: [certificate],
  open: true,
  defaultTargetGroups: [targetGroup],
});


    // Create Fargate service
    const fargateService = new ecs.FargateService(this, 'Service', {
      cluster,
      taskDefinition,
      desiredCount: 1,
      assignPublicIp: false,
    });

    // Register service with target group
    targetGroup.addTarget(fargateService);

    httpListener.addAction('HttpRedirect', {
      action: elbv2.ListenerAction.redirect({
        port: '443',
        protocol: elbv2.ApplicationProtocol.HTTPS,
        permanent: true,
      }),
    });

  }
}

aemada-aws avatar May 23 '25 08:05 aemada-aws

@aemada-aws please keep trying and use @ykethan's configuration. The issue is indeterminate and it can take a number of deployments to reproduce.

@ykethan was able to reproduce it and we are blocked from completing SOC2.

I would encourage you guys to get on a call to collaborate.

BwL1289 avatar May 23 '25 14:05 BwL1289

@ykethan is there an update on this? We are blocked and in danger of violating our SLA.

I would use https://github.com/aws/aws-cdk/issues/34235 since it appears its resolved but when toggling redirect_http on, I still see: [Warning at /MyProjectStack/ALB/HttpListener] A default Action already existed on this Listener and was replaced..

This is concerning and what caused production to go down and open this ticket.

BwL1289 avatar May 30 '25 14:05 BwL1289

I was able to reproduce the issue, working on a fix.

aemada-aws avatar May 31 '25 06:05 aemada-aws

[Warning at /MyProjectStack/ALB/HttpListener] A default Action already existed on this Listener and was replaced..

This warning is expected because when you do httpListener.addAction you replace the default action on the listener that was routing to the target group via defaultTargetGroups: [targetGroup], // Associate target group with listener

aemada-aws avatar May 31 '25 07:05 aemada-aws

[Warning at /MyProjectStack/ALB/HttpListener] A default Action already existed on this Listener and was replaced..

This warning is expected because when you do httpListener.addAction you replace the default action on the listener that was routing to the target group via defaultTargetGroups: [targetGroup], // Associate target group with listener

Right, but the user has no way around this because CDK does this unconditionally. Theres a note where this acknowledgment block is in base-listener.ts (or similar) that this behavior will change and an error will be thrown instead in the next major release.

I'd link to the block but I'm currently on my phone.

BwL1289 avatar May 31 '25 11:05 BwL1289

Right, but the user has no way around this because CDK does this unconditionally. Theres a note where this acknowledgment block is in base-listener.ts (or similar) that this behavior will change and an error will be thrown instead in the next major release.

Why do you mean by CDK does this unconditionally? The user adds an action a listener to direct traffic to the target group, later on the user adds another action on the same listener to redirect traffic to a different port so the previous default action is replaced. CDK right now only warns in just case the user did not intend to do this.

This will not change to an error until CDK 3.0 which allows breaking changes.

aemada-aws avatar Jun 03 '25 06:06 aemada-aws

I talked to @ykethan and the reproduction steps used are reproducing the error due to an invalid stack by trying to attach a target group to a service before adding it to a load balancer, so I'm still not able to reproduce the error in this issue with a valid stack.

@BwL1289 can you provide a stack that reproduces the issue?

aemada-aws avatar Jun 04 '25 13:06 aemada-aws

I was able to reproduce the issue, working on a fix.

@aemada-aws I'm confused, I thought you were able to reproduce?

invalid stack by trying to attach a target group to a service before adding it to a load balancer,

What do you mean?

Here's a stripped down example. Again, you will need to deploy multiple times (more than 10).


# Create fargate service
self._alb_fargate_service = ecs_patterns.ApplicationLoadBalancedFargateService(
    self,
    "AlbFargateSvc",
    assign_public_ip=True,  # Public facing ALB
    security_groups=[self._alb_fargate_security_group],
    task_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PUBLIC),  # Public facing ALB
    cluster=self.ecs_cluster,
    task_definition=self.flask_task_def,
    public_load_balancer=True,
    platform_version=ecs.FargatePlatformVersion.LATEST,
    load_balancer=self.load_balancer,
    protocol=lb.ApplicationProtocol.HTTP,  # TODO+
    redirect_http=False,  # TODO+
    # certificate=self.certificate_svc.certificate,  # TODO+
    record_type=ecs_patterns.ApplicationLoadBalancedServiceRecordType.CNAME,
    circuit_breaker=ecs.DeploymentCircuitBreaker(rollback=True),
    enable_ecs_managed_tags=True,
    min_healthy_percent=100,
    max_healthy_percent=200,  # Note: enables rolling updates
    enable_execute_command=True,
    runtime_platform=self._alb_fargate_service_runtime_platform,
    deployment_controller=ecs.DeploymentController(type=ecs.DeploymentControllerType.ECS),
    desired_count=1,  # Only boot up 1 task to start
    capacity_provider_strategies=[
        ecs.CapacityProviderStrategy(capacity_provider="FARGATE", weight=1, base=1),
    ],
)

# Create https listener
self._https_alb_listener = self._alb_fargate_service.load_balancer.add_listener(
    "HttpsAlbListener",
    open=True,
    port=443,
    certificates=[lb.ListenerCertificate.from_arn(self.certificate_svc.certificate.certificate_arn)],
)

# Get http listener
http_alb_listener = self._alb_fargate_service.listener

# Add a redirect action to the HTTP listener to redirect HTTP traffic to HTTPS
http_alb_listener.add_action(
    "HttpAlbRedirectListenerAction",
    action=lb.ListenerAction.redirect(
        port="443",
        protocol="HTTPS",
        permanent=True,
    ),
)

# Create the https listener config for the ecs target
https_listener_config = ecs.ListenerConfig.application_listener(
    listener=self._https_alb_listener,
    port=self.host_port,
    deregistration_delay=Duration.seconds(10),
)

# Create the ECS Target
self._ecs_target = ecs.EcsTarget(
    container_name=self.flask_task_def_container_svc.container.container_name,
    container_port=self.host_port,
    new_target_group_id="HttpsTargetGroup",
    listener=https_listener_config,
)

# Register the target with the target group
self._alb_fargate_service.service.register_load_balancer_targets(self._ecs_target)

BwL1289 avatar Jun 04 '25 14:06 BwL1289

@aemada-aws I'm confused, I thought you were able to reproduce?

apologies for the confusion, what I reproduced with Kethan was a different issue resulting from an invalid stack that throws the same error which led us to believe we reproduced the intended issue. I'm now trying your reproduction stack and will keep you updated.

aemada-aws avatar Jun 17 '25 10:06 aemada-aws

@aemada-aws is there an update? We are still blocked with SOC2

BwL1289 avatar Jun 29 '25 14:06 BwL1289

@aemada-aws can you please provide an update

BwL1289 avatar Aug 01 '25 21:08 BwL1289

@BwL1289 Sorry for the delay, this got lost in my inbox. I was able to reproduce the issue, working on a fix.

aemada-aws avatar Aug 25 '25 13:08 aemada-aws

@aemada-aws ok.

BwL1289 avatar Aug 25 '25 14:08 BwL1289

As indicated by the other GH issues I've created on the topic, CDK doesn't natively support this use case without the workaround above, so I'm not sure how to mitigate the error.

For context, I need this to comply with the SOC2 security standard, so appreciate support from AWS to either implement native support or find another workaround to this current workaround.

@BwL1289 While I'm looking for a fix, does using redirectHTTP: true in the latest cdk version instead of the workaround that causes the issue unblock the compliance? The bug that required the workaround was fixed in https://github.com/aws/aws-cdk/pull/34510.

The fix for this will take time as I only managed to reproduce it once in 30+ deployments and as we have other higher priority more common issues, i would prefer to allocate the resources to other issues if your compliance is unblocked and the issue is non urgent.

aemada-aws avatar Aug 27 '25 15:08 aemada-aws