terraform-provider-aws icon indicating copy to clipboard operation
terraform-provider-aws copied to clipboard

ECS Service always wants to be recreated due to capacity provider.

Open spatel96 opened this issue 3 years ago • 17 comments
trafficstars

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

$ terraform -v
Terraform v0.13.6
+ provider.aws v3.73.0

Affected Resource(s)

  • aws_ecs_service

Terraform Configuration Files

Terraform Plan:

  # module.my_service.aws_ecs_service.ecs_service must be replaced
+/- resource "aws_ecs_service" "ecs_service" {
        cluster                            = "arn:aws:ecs:us-west-1:***:cluster/ecs-related-tapir"
        deployment_maximum_percent         = 200
        deployment_minimum_healthy_percent = 100
        desired_count                      = 2
        enable_ecs_managed_tags            = false
        enable_execute_command             = false
        health_check_grace_period_seconds  = 120
      ~ iam_role                           = "aws-service-role" -> (known after apply)
      ~ id                                 = "arn:aws:ecs:us-west-1:***:service/my-cluster/my-service-5e" -> (known after apply)
      ~ launch_type                        = "EC2" -> (known after apply)
        name                               = "my-service-service-5e"
      + platform_version                   = (known after apply)
      - propagate_tags                     = "NONE" -> null
        scheduling_strategy                = "REPLICA"
      - tags                               = {} -> null
      ~ tags_all                           = {} -> (known after apply)
      ~ task_definition                    = "arn:aws:ecs:us-west-1:***:task-definition/my-service-:23" -> "arn:aws:ecs:us-west-1:***:task-definition/my-service:1"
        wait_for_steady_state              = false

      + capacity_provider_strategy { # forces replacement
          + base              = 0
          + capacity_provider = "ecs-capacity-provider-related-tapir"
          + weight            = 100
        }

        deployment_controller {
            type = "CODE_DEPLOY"
        }

        load_balancer {
            container_name   = "my-service"
            container_port   = 7171
            target_group_arn = "arn:aws:elasticloadbalancing:us-west-1:***:targetgroup/abcdef/abcdef"
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Terraform Apply error:

Error: error creating ECS service (my-service): InvalidParameterException: Creation of service was not idempotent.

Expected Behavior

No infrastructure changes should be made

Actual Behavior

The ECS Service resource will be recreated, but the apply with fail with the error logs specified above.

Steps to Reproduce

  1. Provision an ECS service with a capacity provider
  2. terraform apply

spatel96 avatar Jan 28 '22 17:01 spatel96

FYI we are still seeing this bug in the provider version 4.9.

gvwirth avatar Apr 14 '22 16:04 gvwirth

Possibly related to existing issue: https://github.com/hashicorp/terraform-provider-aws/issues/2283 (destroy/create behavior)

*Correction -- as the update was not expected behavior, i'm guessing the capacity_provider_strategy is inherited from the aws_ecs_cluster where it is defined. Do you mind confirming @spatel96 ?

anGie44 avatar Apr 25 '22 18:04 anGie44

This issue is very destructive.

When an ECS cluster has a default_capacity_provider_strategy setting defined, Terraform will mark all services that don't have

  lifecycle {
    ignore_changes = [
      capacity_provider_strategy
    ]
  }

to be recreated.

a-nych avatar May 05 '22 09:05 a-nych

It's the only differences I can see when comparing capacity_provider_strategy and deployment_controller are MaxItems and DiffSuppressFunc. I wonder if that is what's causing this recreation... I would have thought that the removing the ForceNew would have also removed recreating capacity_provider_strategy...

https://github.com/hashicorp/terraform-provider-aws/blob/611b4737168f4f0051bb63ef221f0e76f156f392/internal/service/ecs/service.go#L96-L107

https://github.com/hashicorp/terraform-provider-aws/blob/611b4737168f4f0051bb63ef221f0e76f156f392/internal/service/ecs/service.go#L44-L47

nitrocode avatar May 25 '22 03:05 nitrocode

Hi @nitrocode thanks for looking through the code! My initial thinking was that @spatel96 is using both the aws_ecs_capacity_provider and aws_ecs_service resources so while capacity_provider_strategy is not explicitly configured in the aws_ecs_service terraform configuration, the value is inherited from the separate aws_ecs_capacity_provider resource after an initial terraform apply, so the next apply or plan will show that diff (though this still just my conjecture as the original configuration is not yet known). And then that diff is handled with this portion of the code https://github.com/hashicorp/terraform-provider-aws/blob/a2843eb5d274b2fe3598cf863d228e715dacc343/internal/service/ecs/service.go#L354-L372 which is forcing the new resource. The logic needs to account for cases where the provider strategy is inherited from an outside configuration or simply mark the capacity_provider_strategy as Computed so that the diff is ignored.

anGie44 avatar May 26 '22 14:05 anGie44

I was seeing this same issue and can confirm that adding a capacity_provider_strategy block in my aws_ecs_service, duplicating my default_capacity_provider_strategy, resolved it.

relsqui avatar Aug 16 '22 16:08 relsqui

This has been a big annoyance for us. We have many production ECS Services that are using LaunchType: EC2 and we'd like to convert them to using a newly defined default Capacity Provider strategy on the cluster.

If we simply set the capacity provider, it will force the re-create of the ECS Service leading to temporary disruption/downtime. This isn't necessary as AWS supports the graceful transition of LaunchType: EC2 to Capacity Provider (but not the other way around). It does a "force new deployment" of the ECS Tasks, but it uses the standard ECS rollout mechanism (e.g., minHealthy) so there's no disruption.

Our current workaround is to use the ignore_changes as above, plus converting ECS Services to Capacity Provider via separate CLI type automation.

(Also, tangentially related is #26533 - for transitioning existing ECS Services to use the Cluster's default capacity provider strategy)

ericdahl avatar Aug 31 '22 00:08 ericdahl

if I may add, empty capacity_provider_strategy list could be useful also it seems this support was added to the AWS cli and API - https://github.com/aws/containers-roadmap/issues/838#issuecomment-1159092125 so that

$ aws ecs update-service --cluster cluster-name --service service-name --capacity-provider-strategy '[]' --force-new-deployment

removes strategy from a ECS service (when inherited from default defined at the ECS cluster level) which is useful if you're planning to remove the default capacity provider strategy from the ECS cluster

It seems that currently if no capacity_provider_strategy is defined in the aws_ecs_service resource the AWS API call will not have any value set and the default strategy will be used

remil1000 avatar Feb 09 '23 11:02 remil1000

It's sad to see that It's been over 1 year and still not fixed. :-( AWS has to do a better job than this if they want people to keep using ECS and keep it stay alive.

vishwa-trulioo avatar Feb 18 '23 19:02 vishwa-trulioo

any updates on this? I see the PR is pending

bbratchiv avatar Jul 12 '23 11:07 bbratchiv

Any update on this?

rmccarthy-ellevation avatar Jul 14 '23 16:07 rmccarthy-ellevation

@breathingdust Hi, is this something you can look into? The AWS side has been fixed, and now Terraform incorrectly causes replacement.

1oglop1 avatar Oct 15 '23 12:10 1oglop1

Issue still exists.

claudiosf avatar Nov 22 '23 18:11 claudiosf

Issue still exists.

Yep we're facing the same problem too

Luis-3M avatar Nov 23 '23 10:11 Luis-3M

When the fix would be released? It is affecting my team too.

harbinder-kleene avatar Dec 18 '23 09:12 harbinder-kleene

+1

This is a major issue. We are running many FARGATE instances and would like to increase the capacity further by adding FARGATE SPOT instances. However, it is not possible to do without downtime (it destroys the whole ECS service and recreates it).

ZilvinasKucinskas avatar Jan 28 '24 13:01 ZilvinasKucinskas