terraform-provider-aws icon indicating copy to clipboard operation
terraform-provider-aws copied to clipboard

ClientException: Too many concurrent attempts to create a new revision of the specified family.

Open Dzhuneyt opened this issue 6 years ago • 18 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.5

Affected Resource(s)

  • aws_ecs_task_definition

Terraform Configuration Files

data "template_file" "task_definition__backend" {
  template = file("${path.module}/task_definitions/backend.json")

  vars = {
    image_url = "1111111111111111.dkr.ecr.us-east-1.amazonaws.com/my-repo-here/backend:${var.version_tag}"
    container_name = "backend"

    log_group_region = data.aws_region.current.name
    log_group_name = aws_cloudwatch_log_group.app.name
  }
}

data "template_file" "task_definition__frontend" {
  template = file("${path.module}/task_definitions/frontend.json")

  vars = {
    image_url = "1111111111111111.dkr.ecr.us-east-1.amazonaws.com/my-repo-here/frontend:${var.version_tag}"
    container_name = "frontend"

    log_group_region = data.aws_region.current.name
    log_group_name = aws_cloudwatch_log_group.app.name
  }
}

resource "aws_ecs_task_definition" "backend" {
  family = local.ecs_cluster_name
  container_definitions = data.template_file.task_definition__backend.rendered
  network_mode = "awsvpc"
}
resource "aws_ecs_task_definition" "frontend" {
  family = local.ecs_cluster_name
  container_definitions = data.template_file.task_definition__frontend.rendered
  network_mode = "awsvpc"
}

resource "aws_ecs_service" "backend" {
  name = "${local.ecs_cluster_name}_backend"
  cluster = aws_ecs_cluster.ecs_cluster.id
  task_definition = aws_ecs_task_definition.backend.arn
  desired_count = "1"
  deployment_minimum_healthy_percent = 100
  deployment_maximum_percent = 300
  network_configuration {
    subnets = aws_subnet.private_subnet.*.id
    security_groups = [
      aws_security_group.sg_for_ec2_instances.id]
  }

  load_balancer {
    # Register the ECS service within the ALB target group
    # This makes the service participate in health checks
    # and receive traffic when healthy
    target_group_arn = aws_alb_target_group.target_group_backend.arn
    container_name = "backend"
    container_port = "80"
  }

  service_registries {
    registry_arn = aws_service_discovery_service.service_discovery.arn
    container_name = "backend"
    container_port = 80
  }

  depends_on = [
    aws_alb_listener.http_traffic,
  ]
}
resource "aws_ecs_service" "frontend" {
  name = "${local.ecs_cluster_name}_frontend"
  cluster = aws_ecs_cluster.ecs_cluster.id
  task_definition = aws_ecs_task_definition.frontend.arn
  desired_count = "2"
  deployment_minimum_healthy_percent = 100
  deployment_maximum_percent = 300
  network_configuration {
    subnets = aws_subnet.private_subnet.*.id
    security_groups = [
      aws_security_group.sg_for_ec2_instances.id]
  }

  load_balancer {
    target_group_arn = aws_alb_target_group.target_group_frontend.arn
    container_name = "frontend"
    container_port = "80"
  }

  service_registries {
    registry_arn = aws_service_discovery_service.service_discovery.arn
    container_name = "frontend"
    container_port = 80
  }

  depends_on = [
    aws_alb_listener.http_traffic,
    aws_ecs_service.backend,
  ]
}

Expected Behavior

Running terraform apply again and again should not cause any errors. I expect that AWS task definitions get updated properly.

Actual Behavior

AWS task definitions don't get updated and an error is thrown approximately 1 out of 5 attempts. If I rerun terraform apply another time, it usually works.

Error: ClientException: Too many concurrent attempts to create a new revision of the specified family.
        status code: 400, request id: efce29cc-a021-4d6b-b603-d84c8b7a91fa

Steps to Reproduce

  1. terraform apply

Important Factoids

Nothing special. Just two ECS services and the corresponding task definitions for them. It's worth noting that they are both within the same "family". Maybe this has some impact?

Dzhuneyt avatar Aug 15 '19 07:08 Dzhuneyt

I'm running into the same issue. I reduced the Terraform configuration to make it easier to reproduce it (the left out facts are the same as in the initial post):

Terraform Version

Terraform v0.12.7 + provider.aws v2.26.0

Terraform Configuration Files

resource "aws_ecs_task_definition" "this" {
  count  = 2
  family = "test-family"

  container_definitions = jsonencode([{
    name   = "test"
    image  = "dummy"
    memory = 512
  }])
}

Important Factoids

The error can be circumvented by running terraform apply -parallelism=1, but this slows down the execution time up to factor 10 compared to the default parallelism.

When you set count = 1 it applies without errors, but of course only generates a single resource.

juls avatar Aug 31 '19 12:08 juls

I had similar issue. I was able to fix it by using different family for each task definition. Using for example for_each on a map instead of count, then family = "local.ecs_cluster_name-${each.key}"

zeik avatar Nov 26 '19 11:11 zeik

You should have two task definitions with different values for family. One for the frontend, one for the backend. https://docs.aws.amazon.com/AmazonECS/latest/userguide/task_definition_parameters.html#family

When you register a task definition, you give it a family, which is similar to a name for multiple versions of the task definition.

Task definition has nothing to do with your cluster, you can use the same in many clusters, or on many services. But if each service runs a different set of containers, that's a different task definition.

hatch-carl avatar Dec 11 '19 18:12 hatch-carl

still an issue 2 years later lol

for terragrunt, use: --terragrunt-parallelism 4

see https://terragrunt.gruntwork.io/docs/features/execute-terraform-commands-on-multiple-modules-at-once/#limiting-the-module-execution-parallelism

justinTM avatar Oct 14 '21 19:10 justinTM

Hey y'all :wave: Thank you for taking the time to file this issue and for the continued discussion! Given that there's been a number of AWS provider releases since this was initially filed (and since the last update), can anyone confirm whether you're still experiencing this behavior?

justinretzolk avatar Dec 09 '21 22:12 justinretzolk

I've just seen this issue when deploying via Terraform Cloud :( TF version is 1.0 and the AWS provider is specified as "~> 3.63.0"

andir avatar Dec 23 '21 12:12 andir

Yes, I'm also still seeing this. Just hit it now actually which brought me here. I have a root module that deploys 2 target groups of tasks with different versions of the same task family so we can switch back and forth via the load balancer if needed for a blue/green style deployment. Whenever there is a change to our terraform code and both container groups are active we run into this issue.

kuritz avatar Dec 28 '21 15:12 kuritz

Still an issue, ran into it today with provider 4.1

sjsadowski avatar Feb 20 '22 01:02 sjsadowski

Still an issue with hashicorp/aws 4.14 Re-applying a couple of times made it work for me (some tasks were created at each apply).

tim-x-y-z avatar Jul 22 '22 13:07 tim-x-y-z

Still running into this issue with creating 1 ECS cluster. Running the apply back to back usually gets over it.

mswezey23 avatar Aug 08 '22 21:08 mswezey23

Still an issue when using hashicorp/aws v4.15.1.

ga-tb avatar Sep 15 '22 14:09 ga-tb

Yep, still receiving this issue as well with TF 1.2.9 and the latest aws provider. Just have to re-run the apply to fix it.

promenadeviki avatar Sep 16 '22 15:09 promenadeviki

Same for me. The only solution would be to create many families like proposed on other comments

github-gael-soude avatar Sep 23 '22 08:09 github-gael-soude

Still a problem on provider v4.32.0

luispabon avatar Sep 27 '22 16:09 luispabon

Same issue for for me as well. Re running works fine with same family.

prashankprince avatar Oct 21 '22 11:10 prashankprince

Encountered this problem today as well while applying just three containers sharing the same family into a single cluster. Had to give each of them unique family IDs to circumvent which, admittadly, is not a terrible workaround.

Edit: Appending an incremental integer to each family name did not work for me. Hm...

drewdunne avatar Jan 10 '23 17:01 drewdunne