terraform-provider-docker icon indicating copy to clipboard operation
terraform-provider-docker copied to clipboard

`docker_container` with `must_run = false` replacement fails every other run

Open zanecodes opened this issue 1 year ago • 5 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and docker Provider) Version

Terraform v1.3.7
on linux_amd64
+ provider registry.terraform.io/kreuzwerker/docker v3.0.2

Affected Resource(s)

  • docker_container

Terraform Configuration Files

terraform {
  required_providers {
    docker = {
      source = "kreuzwerker/docker"
      version = "3.0.2"
    }
  }
}

resource "docker_container" "container" {
  name = "repro"
  image = "alpine:latest"
  must_run = false
}

Debug Output

https://gist.github.com/zanecodes/afd3118b02f4afee4218e6c3885dd62b

Expected Behaviour

Container is successfully destroyed, created, and started again.

Actual Behaviour

Second terraform apply fails with the following error:

Error: Error waiting for container removal 'e58baf330305e7ab11f28c4bc34cf2224cd681af0e0d6f24a04a0ddc431a7d0c': Error response from daemon: no such container

A subsequent run will detect that the container was removed outside of Terraform and trigger only a create rather than a replace, which succeeds; another run after that will once again fail to replace the container, and so on.

Since the container does not have a long-running entrypoint, it exits immediately as expected. For my use case this is the desired behavior. The docker_container attempts to replace it on the second terraform apply since several attributes such as image, network_mode, pid_mode, and ulimit are erroneously detected as changed, although this is also the desired behavior for my use case. This can also be intentionally forced by adding env = ["TIME=${timestamp()}"] to the docker_container definition.

When the Terraform provider tries to destroy the docker_container, for some reason it waits for the container to be in the not-running state instead of the removed state. Obviously, this will fail since it just removed the container 6 lines above that.

This logic was introduced in #322, specifically in commit 617899f566eeef03c527e60eda83dfff69018438, but I'm not sure I understand why, or how this could have ever worked; it seems to me that the fix should be to remove that logic and always use container.WaitConditionRemoved. The intended behavior of the rm attribute is unclear to me; additionally, #173 is still an issue, so the rm attribute is entirely broken at the moment. If the intended behavior is for terraform destroy to only stop the container if rm = false, and to remove the container if rm = true, then the fix for this should likely be to wrap these lines in if d.Get("rm").(bool) { ... }.

Steps to Reproduce

  1. terraform apply
  2. terraform apply a second time

zanecodes avatar Apr 19 '23 19:04 zanecodes