terraform-aws-ecs icon indicating copy to clipboard operation
terraform-aws-ecs copied to clipboard

feat: Replacing hack with track_latest task definition attribute

Open ivan-sukhomlyn opened this issue 1 year ago β€’ 21 comments

Description

Replacing calculation of the latest ECS task definition revision using the track_latest task definition resource attribute.

Motivation and Context

Resolving https://github.com/terraform-aws-modules/terraform-aws-ecs/issues/164

Breaking Changes

Yes, the minimal AWS provider version is >=5.37

How Has This Been Tested?

  • [x] I have updated at least one of the examples/* to demonstrate and validate my change(s)
  • [x] I have tested and validated these changes using one or more of the provided examples/* projects
  • [x] I have executed pre-commit run -a on my pull request

ivan-sukhomlyn avatar Feb 20 '24 15:02 ivan-sukhomlyn

@bryantbiggs @antonbabenko I'm marking the PR as a draft because this functionality doesn't work as expected. Terraform doesn't ignore the default Docker image tag at the container definition. In that case, it tries to replace it and release a new version of an ECS service.

details:

  1. terraform apply with track_latest = true
  # module.ecs_service_test_api.aws_ecs_task_definition.this[0] will be updated in-place
  ~ resource "aws_ecs_task_definition" "this" {
        id                       = "test-api"

      ~ track_latest             = false -> true
        # (11 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }
  1. Manually deploy the service with a new revision of the ECS task definition with an updated Docker image tag to simulate CI/CD.
  2. Start terraform apply once again. A new revision triggers the refreshing of the container-definition state, which we want to eliminate in case of having a new version.
Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:

  # module.ecs_service_test_api.aws_ecs_task_definition.this[0] has changed
  ~ resource "aws_ecs_task_definition" "this" {
        id                       = "test-api"
      ~ revision                 = 484 -> 485

        # (11 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }


Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or
respond to these changes.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place
+/- create replacement and then destroy
 <= read (data resources)

Terraform will perform the following actions:

  # module.ecs_service_test_api.data.aws_ecs_task_definition.this[0] will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_ecs_task_definition" "this" {
      + arn                  = (known after apply)
      + arn_without_revision = (known after apply)
      + execution_role_arn   = (known after apply)
      + family               = (known after apply)
      + id                   = (known after apply)
      + network_mode         = (known after apply)
      + revision             = (known after apply)
      + status               = (known after apply)
      + task_definition      = "test-api"
      + task_role_arn        = (known after apply)
    }

  # module.ecs_service_test_api.aws_ecs_service.this[0] will be updated in-place
  ~ resource "aws_ecs_service" "this" {
        id                                 = "arn:aws:ecs:xxxxx:xxxxx:service/core-dev/test-api"
        name                               = "test-api"

      ~ task_definition                    = "test-api:485" -> (known after apply)
        # (14 unchanged attributes hidden)

        # (8 unchanged blocks hidden)
    }

  # module.ecs_service_test_api.aws_ecs_task_definition.this[0] must be replaced
+/- resource "aws_ecs_task_definition" "this" {
      ~ arn                      = "arn:aws:ecs:xxxxx:xxxxx:task-definition/test-api:485" -> (known after apply)
      ~ arn_without_revision     = "arn:aws:ecs:xxxxx:xxxxx:task-definition/test-api" -> (known after apply)
      ~ container_definitions    = jsonencode(
          ~ [
              ~ {
                  ~ image                  = "xxxxx.dkr.ecr.xxxxx.amazonaws.com/xxxxx:test-4c2e72beb" -> "xxxxx.dkr.ecr.xxxxx.amazonaws.com/xxxxx:test-latest"
                    name                   = "test-api"
                    # (19 unchanged attributes hidden)
                },
            ] # forces replacement
        )
      ~ id                       = "test-api" -> (known after apply)
      ~ revision                 = 485 -> (known after apply)

        # (8 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

ivan-sukhomlyn avatar Feb 20 '24 15:02 ivan-sukhomlyn

test-api? Which example can I use to reproduce this? :thinking: Cuz in general, track_latest works for me: when the image in config is different from Terraform state, but is the same as in reality, Terraform reports that no changes are needed. :woman_shrugging:

Terraform doesn't ignore the default Docker image tag

Wait, what do you mean by "default" tag? :face_with_spiral_eyes:

bolshakoff avatar Feb 20 '24 20:02 bolshakoff

@bolshakoff

Which example can I use to reproduce this? πŸ€”

I guess it's obvious that I replaced the real names of my project with those values, including the image tag. And the "default" one is the Docker image used with Terraform while initially creating the container definition and the service.

Did you try to create a new task definition revision, deploy a service after the initial creation using Terraform, and then run terraform apply once again?

ivan-sukhomlyn avatar Feb 21 '24 10:02 ivan-sukhomlyn

@ivan-sukhomlyn I've been testing this functionality as well. It doesn't seem to work as expected, as Terraform still regards the resource as changed because the revision and thereby the arn has changed. I can force it to work as expected by adding a ignore_changes = [container_definitions] lifecycle block, but that has the consequence that changes to the container_definition will not be applied. Seems it's not possible to only ignore the image, since container_definitions is simply one big string.

kaarejoergensen avatar Feb 21 '24 16:02 kaarejoergensen

@ivan-sukhomlyn

I guess it's obvious that I replaced the real names of my project with those values, including the image tag.

Yes, but which project? Is there some nice, minimal, but complete example to quickly reproduce the problem? So far, resorted to the example from README.md. πŸ€•

Regarding the "default" tag - OK, also from what I see in your scenario, when you run terraform apply in step 3, you are expecting Terraform to completely ignore any differences between your task definition resource and task definition reality. But it's not supposed to do that. It's actually smarter: it will indeed display zero diff, but only in case your task definition resource matches task definition of latest revision (even if it was deployed outside Terraform); but in case it is different, Terraform will still nicely detect it. πŸ‘Œ From my humble understanding. And experiments.

So, actually, maybe there is nothing to reproduce, and it's just a misunderstanding. Because yes, the input description might suggest that, when set to true, the flag ignores the config task definition, if it is outdated:

Whether should track latest task definition or the one created with the resource. Default is false.

But again, it works a bit differently; to have zero diff, you still have to match your resource config with the reality.

@kaarejoergensen - so, you have the same image in resource as in reality, but Terraform still suggests a redeployment? If so, would love to try to reproduce your scenario as well. πŸ™

bolshakoff avatar Feb 21 '24 22:02 bolshakoff

@kaarejoergensen Yes, I think the single option to get an expected result of not tracking image tag changes outside of Terraform is to have a separate resource for the ECS container definition. https://github.com/hashicorp/terraform-provider-aws/issues/20121#issuecomment-1959038575

BTW the current hack implemented through the data source does its job because, in case of changes made by CI/CD, we just use the latest task definition and don't touch the aws_ecs_task_definition Terraform resource from the state if there are no changes for it in the configuration.

ivan-sukhomlyn avatar Feb 22 '24 10:02 ivan-sukhomlyn

single option to get an expected result of not tracking image tag changes outside of Terraform is to have a separate resource for the ECS container definition

There is at least one more:

track_latest = true
image = data.aws_ecs_container_definition.this.image

bolshakoff avatar Feb 22 '24 11:02 bolshakoff

thanks! However, it will not work for the first ECS task definition and service creation. Only in cases when an appropriate task definition is already present with a pre-defined image at a container definition.

That's why I'm not sure that it's an acceptable solution for this module; it will cause a lot of questions from the community while trying to create resources from scratch with the module.

ivan-sukhomlyn avatar Feb 22 '24 12:02 ivan-sukhomlyn

Well, in the worst case, can't we wrap it into something like:

variable "force_new_image" {
  type    = bool
  default = true
  # ...
}

variable "new_image" {
  type = string
  # ...
}

data "aws_ecs_container_definition" "this" {
  count = var.force_new_image ? 0 : 1
  # ...
}

locals {
  current_image = data.aws_ecs_container_definition.this.image
  image         = var.force_new_image ? var.new_image : local.current_image
  # ...
}

bolshakoff avatar Feb 22 '24 21:02 bolshakoff

would something like this work (haven't tried myself)

coalesce(try(data.aws_ecs_container_definition.this.image, ""), "latest")

bryantbiggs avatar Feb 22 '24 21:02 bryantbiggs

And a design note - I think it would be much more user-friendly to hide both ignore_task_definition_changes and task_definition_track_latest variables, and instead, expose the ability to enable the desired behavior via a more high-level variable:

variable "external_image_deployment" {
  type = bool
  # ...
}

Or at least via something like use_current_image or force_new_image & new_image.

Because what is the desired behavior, actually? What was the original motivation behind all these workarounds, hack, issues and the recently merged PR? It was mainly, or even exclusively about being able to deploy image externally.

bolshakoff avatar Feb 23 '24 23:02 bolshakoff

while that may be the most common use case, it will definitely raise issues around users not being able to make other changes since its the entire task definition thats ignored. so the variable ignore_task_definition_changes says what it does, right on the tin. something more abstract like external_image_deployment takes a bit more to understand whats going on, what do I have access to change, etc.

bryantbiggs avatar Feb 23 '24 23:02 bryantbiggs

@ivan-sukhomlyn

BTW the current hack implemented through the data source does its job because, in case of changes made by CI/CD, we just use the latest task definition and don't touch the aws_ecs_task_definition Terraform resource from the state if there are no changes for it in the configuration.

If my ci/cd is changing the image_tag and deploy to ecs it also creates a new task revision. I'm using the Codepipeline ECS action.

I did the trick by specifying my image as

data "aws_ecr_image" "latest" {
  repository_name = lower(local.project)
  image_tag = data.aws_ssm_parameter.image_tag.value
}

data "aws_ssm_parameter" "image_tag" {
  name = "/xxx/yyy/image-tag"
}

image  = "${module.ecr.repository_url}:${data.aws_ecr_image.latest.image_tags[0]}"

But if I keep the SSM param up to date with the git sha it doesn't alter the fact that terraform wants to update my container definition

# module.puma-service.aws_ecs_task_definition.this[0] must be replaced
+/- resource "aws_ecs_task_definition" "this" {
      ~ arn                      = "arn:aws:ecs:eu-west-1:xxx:task-definition/yyy-puma:82" -> (known after apply)
      ~ arn_without_revision     = "arn:aws:ecs:eu-west-1:xxx:task-definition/yyy-puma" -> (known after apply)
      ~ container_definitions    = jsonencode(
          ~ [
              ~ {
                  ~ image                  = "xxxx.dkr.ecr.eu-west-1.amazonaws.com/yyy:1c4152b" -> "xxxx.dkr.ecr.eu-west-1.amazonaws.com/yyy:ad857bc"
                    name                   = "webserver"
                  - systemControls         = []
                    # (18 unchanged attributes hidden)
                },
            ] # forces replacement
        )
      ~ id                       = "xxxx-yyy" -> (known after apply)
      ~ revision                 = 82 -> (known after apply)
        # (10 unchanged attributes hidden)

You seem to have succeeded in keeping your terraform without drifting thanks to the hack in this module. Maybe I missed the point. But is it correct that I can use this module if my use case creates a new task_def at each deployment?

remiflament avatar Mar 07 '24 17:03 remiflament

track_latest works for me. I extracted the aws_ecs_task_definition from this module to add the property (see previous comment).

I'm settings my image tag using this syntax :

image  = "${module.ecr.repository_url}:${data.aws_ecr_repository.this.most_recent_image_tags[0]}"

It would be great to add the ability to turn on track_latest and skip the actual hack. Maybe a third scenario in addition to the two actuals.

Did anyone come with news since the last discutions?

remiflament avatar Mar 08 '24 17:03 remiflament

single option to get an expected result of not tracking image tag changes outside of Terraform is to have a separate resource for the ECS container definition

There is at least one more:

track_latest = true
image = data.aws_ecs_container_definition.this.image

@bolshakoff Hey! Could you provide a complete example of your approach? Looks like this could lead to a cyclic dependency issue.

fextr avatar Mar 19 '24 06:03 fextr

Hi @remiflament sorry for the late reply

But is it correct that I can use this module if my use case creates a new task_def at each deployment?

yes, the current hacky approach provides the ability to use external tools during the CI/CD process and deploy new task definition revisions with an image tag without triggering Terraform.

And one more possible method of resolving an issue is to use SSM parameters with an image tag updated by CI/CD while registering a new ECS task definition too.

But I think the best solution, in that case, is to have an additional resource on the AWS provider side, with the possibility of ignoring only image changes by Terraform. Details - https://github.com/terraform-aws-modules/terraform-aws-ecs/pull/171#issuecomment-1959163269

Otherwise, all other methods (SSM parameters, ECR image data source, latest ECS task definition from data source) are just workarounds.

Moreover, some people can use this module not only with ECR as a Docker registry but also with Dockerhub, Artifactory, etc., which eliminates the usage of only the ECR data source method as a replacement for the current behavior with tracking the latest task definition.

ivan-sukhomlyn avatar Mar 19 '24 15:03 ivan-sukhomlyn

@fextr yes, you're right πŸ‘ using the data.aws_ecs_container_definition.this.image leads to a cycle dependency error as this data source requires task definition

For example, considering containers can be more than one at task definition.

module "container_definition" {
# ...

image                    = try(data.aws_ecs_container_definition.this[each.key].image, each.value.image, var.container_definition_defaults.image, null)
# ...
}

locals {
  create_task_definition = var.create && var.create_task_definition

  task_definition = local.create_task_definition ? "${aws_ecs_task_definition.this[0].family}:${aws_ecs_task_definition.this[0].revision}" : var.task_definition_arn
}

data "aws_ecs_container_definition" "this" {
  for_each = { for k, v in var.container_definitions : k => v if local.create_task_definition && try(v.create, true) }

  task_definition = local.task_definition
  container_name  = try(each.value.name, each.key)
}

resource "aws_ecs_task_definition" "this" {
  count = local.create_task_definition ? 1 : 0

  # Convert map of maps to array of maps before JSON encoding
  container_definitions = jsonencode([for k, v in module.container_definition : v.container_definition])
# ...
  track_latest = true
}
$ terraform validate
β•·
β”‚ Error: Cycle: module.container_definition (close), data.aws_ecs_container_definition.this, module.container_definition.var.image (expand), module.container_definition.local.definition (expand), module.container_definition.local.container_definition (expand), module.container_definition.output.container_definition (expand), aws_ecs_task_definition.this, local.task_definition (expand)

ivan-sukhomlyn avatar Mar 19 '24 16:03 ivan-sukhomlyn

@bryantbiggs @antonbabenko what do you think about this situation?

Should we invest in using the track_latest ECS task def functionality with the introduction of breaking change? After that, the module's users will have to track changes in the image field of the ECS task definition by themselves outside of the module and define it via the var to prevent Terraform from re-deploying an ECS service with a previous task definition revision, updated by CI/CD, for example.

Currently, task def revisions are tracked via data source for both module-managed and outside task definitions for the ECS service via the max_task_def_revision local.

Personally, I would rather wait for a possible separate aws_ecs_container_definition resource instead of making such changes.

ivan-sukhomlyn avatar Mar 19 '24 16:03 ivan-sukhomlyn

there is no rush to make a breaking change - if we make a breaking change, I'd love for it to be something really useful. I know the CI/CD story in ECS is not very smooth and a major pain point

bryantbiggs avatar Mar 19 '24 17:03 bryantbiggs

@fextr Good question! Indeed, the example you cite assumes that the container already exists.

If I were to generalize it a bit, for a simple workflow with one image, I would probably go with something like this:

variable "force_new_image" {
  type        = bool
  description = "Force new image deployment"
  default     = false
}

data "aws_ecs_container_definition" "this" {
  count = var.force_new_image ? 0 : 1

  task_definition = local.task_definition_name
  container_name  = local.container_name
}

locals {
  image = var.force_new_image ? var.new_image : data.aws_ecs_container_definition.this.0.image
}

With track_latest enabled, this should allow basically everything:

  1. If the container is already there, and external deployment mechanism is used, simply keep force_new_image disabled. But if one needs to sometimes deploy a new image from Terraform, simply enable force_new_image for these cases.
  2. If not, enable force_new_image for initial deployment and then go along with 1.

bolshakoff avatar Mar 21 '24 14:03 bolshakoff

This PR has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this PR will be closed in 10 days

github-actions[bot] avatar Apr 21 '24 00:04 github-actions[bot]

This PR was automatically closed because of stale in 10 days

github-actions[bot] avatar May 02 '24 00:05 github-actions[bot]

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Jun 02 '24 02:06 github-actions[bot]