terraform-provider-docker icon indicating copy to clipboard operation
terraform-provider-docker copied to clipboard

docker_container devices forces resource replacement at all times

Open acolpan opened this issue 1 year ago • 11 comments
trafficstars

Terraform v1.7.2 kreuzwerker/docker v3.0.2

To reproduce this issue:

  1. Locally, provision a docker_container resource using the nested schema for devices (i.e. as below)
devices {
  host_path = "dev/net/tun"
}
  1. Create the resource (i.e. terraform apply --auto-approve)

  2. Right after the step 2, after a container resource is provisioned, please note that, at this moment, the state is not changed anywhere because it's just been provisioned, try the following command to see the state "# forces replacement" further below.

  terraform plan

  - devices { # forces replacement
      - container_path = "dev/net/tun" -> null
      - host_path      = "dev/net/tun" -> null
      - permissions    = "rwm" -> null
    }
  + devices { # forces replacement
      + host_path = "dev/net/tun"
    }

This should not happen! The "terraform plan" should come out as clean in full agreement with the current state of the docker container. Though the devices after the resource is provisioned works as expected, it does not pass the "terraform plan" state check. This probably is an issue with the kreuzwerker/docker provider where it misses the container state for the devices when it inquires the state of the container.

A sample configuration to reproduce this issue: main.tf

Please replace the PATH_TO_CONFIG_DIR and the PATH_TO_DOWNLOADS_DIR with anything that works for you.

terraform {
  required_providers {
    docker = {
      source  = "kreuzwerker/docker"
      version = "~> 3.0.2"
    }
  }
}

provider "docker" {
  host = "unix:///var/run/docker.sock"
}

data "docker_registry_image" "qbittorrent" {
  name = "linuxserver/qbittorrent:latest"
}

resource "docker_image" "qbittorrent" {
  name          = data.docker_registry_image.qbittorrent.name
  pull_triggers = [data.docker_registry_image.qbittorrent.sha256_digest]
}

resource "docker_container" "qbittorrent" {
  name = "qbittorrent"
  capabilities {
    add = ["NET_ADMIN"]
  }

  devices {
    host_path = "dev/net/tun"
  }

  ports {
    internal = "6881"
    external = "6881"
  }
  ports {
    internal = "8090"
    external = "8090"
  }
  env = ["PUID=1000", "PGID=1000", "TZ=America/Chicago", "UMASK_SET=022", "WEBUI_PORT=8090"]
  volumes {
    host_path      = "PATH_TO_CONFIG_DIR/config"
    container_path = "/config"
  }
  volumes {
    host_path      = "PATH_TO_DOWNLOADS_DIR/downloads"
    container_path = "/downloads"
  }
  restart = "always"
  image   = docker_image.qbittorrent.image_id
}

Steps to reproduce the issue with the above configuration:

  1. Run the following command to provision the resources
terraform apply --auto-approve
  1. Verify that both the qbittorrent image as well as the container are successfully created, and the container is up and running

  2. Run the following command to verify that the state of the qbittorrent container also looks all good including the devices

terraform state show docker_container.qbittorrent
  1. Run the following command to see the state mismatch due to the devices
terraform plan

If you remove the devices, then the states, terraform's vs docker's, agree but the container does not work as expected due to the missing "dev/net/tun" set by the devices. If you keep the devices, then the container works as expected but the states do not agree, and the terraform wants to replace the container.

acolpan avatar Feb 03 '24 19:02 acolpan

@mavogel

IlyesDemineExtVeolia avatar Apr 12 '24 12:04 IlyesDemineExtVeolia

@acolpan We encountered similar problems today. Numerous Docker provider resources are consistently being replaced, even when a replacement shouldn't be necessary. After some debugging, we found that the Docker version on our server had been updated to version 26+. Once we downgraded it to Version 25.0.3, the provider started behaving as expected. Upon reviewing the Docker changelogs, we noticed that changes were made to their API, which could potentially lead to these issues. Unfortunately, this provider is still unmaintained, and a solution is unlikely to be published here anytime soon. You might want to explore some forks of this provider, as the issue might be resolved there. Alternatively, consider switching to a different tool for your Docker-related tasks.

It would be appreciated if you could confirm whether downgrading Docker on the server resolved the issue in your situation.

tloesch avatar Apr 26 '24 12:04 tloesch

Hi, we will be looking into this project again. More on that in the next weeks.

enc avatar Apr 26 '24 12:04 enc

@acolpan We encountered similar problems today. Numerous Docker provider resources are consistently being replaced, even when a replacement shouldn't be necessary. After some debugging, we found that the Docker version on our server had been updated to version 26+. Once we downgraded it to Version 25.0.3, the provider started behaving as expected. Upon reviewing the Docker changelogs, we noticed that changes were made to their API, which could potentially lead to these issues. Unfortunately, this provider is still unmaintained, and a solution is unlikely to be published here anytime soon. You might want to explore some forks of this provider, as the issue might be resolved there. Alternatively, consider switching to a different tool for your Docker-related tasks.

It would be appreciated if you could confirm whether downgrading Docker on the server resolved the issue in your situation.

After upgrading to docker version 26.1.0, aside from the devices property I was having this issue with, I started experiencing a similar issue with another property named network_mode. However, there's a workaround to this problem. You can use a lifecycle property and ignore changes to those properties as in the following resource example I am sending you. I hope this helps.

resource "docker_container" "qbittorrent" {  .  .  .   lifecycle {     ignore_changes = [       devices,       network_mode,     ]   }   }

acolpan avatar Apr 27 '24 03:04 acolpan

Hi, we will be looking into this project again. More on that in the next weeks.

Thanks. As I responded to @tloesch, there's a workaround to this issue. However, ignoring the changes to those properties is limiting so a permanent solution is preffered.

acolpan avatar Apr 27 '24 03:04 acolpan

@tloesch @acolpan @enc I have the same issue. I use this module (https://github.com/terraform-aws-modules/terraform-aws-lambda) to build and deploy lambdas docker. This module use this provider to build the docker image. The ressource "docker_image" is always rebuild even if context or Dockerfile doesn't changes. 😔 Similar ticket here https://github.com/kreuzwerker/terraform-provider-docker/issues/607

IlyesDemineExtVeolia avatar Apr 29 '24 15:04 IlyesDemineExtVeolia

@tloesch @acolpan @enc I have the same issue. I use this module (https://github.com/terraform-aws-modules/terraform-aws-lambda) to build and deploy lambdas docker. This module use this provider to build the docker image. The ressource "docker_image" is always rebuild even if context or Dockerfile doesn't changes. 😔

@IlyesDemineExtVeolia I know this not a Long Term Solution. But you can try downgrading docker to Version 25.0.x on the target system. This Version still gets Security Updates.

tloesch avatar Apr 29 '24 15:04 tloesch

@tloesch Thanks for your response. I still have the issue with docker 25. I will wait for a permanent fix

IlyesDemineExtVeolia avatar Apr 30 '24 09:04 IlyesDemineExtVeolia

Hi, Any update on this issue?

kukukk avatar May 07 '24 20:05 kukukk

@enc Hi, any news ?

IlyesDemineExtVeolia avatar Jun 12 '24 13:06 IlyesDemineExtVeolia

Facing this issue as well.

I poked around in Docker to see what might have broken this and believe this to be the culprit change (at least for replacements due to network_mode): https://github.com/moby/moby/commit/4eed3dcdfeb147529339e06f2dceecf43caed45a.

What's more is it looks like there's been further refinement to how they specify the default network_mode: https://github.com/moby/moby/pull/48008. Examining this change makes it pretty clear what the new defaults are:

  • Unix: https://github.com/moby/moby/blob/2cfc2a57a84900db112bd911143de7aed719d739/daemon/network/network_mode_unix.go#L10
  • Windows: https://github.com/moby/moby/blob/2cfc2a57a84900db112bd911143de7aed719d739/daemon/network/network_mode_windows.go#L8

Where the actual strings are defined here: https://github.com/moby/moby/blob/2cfc2a57a84900db112bd911143de7aed719d739/api/types/network/network.go#L9-L20 (spoiler: "bridge" on Unix, "nat" on Windows).

Here's hoping this may help inform a permanent solution!

cdn-bborucki avatar Jul 12 '24 19:07 cdn-bborucki

Sounds like a Docker bug to me, because their documentation explicitly states that the default is "bridge": https://docs.docker.com/engine/network/drivers/

Either the documentation is wrong or the implementation is wrong. I filed this bug report: https://github.com/docker/for-win/issues/14417

cowwoc avatar Nov 11 '24 18:11 cowwoc

I spent some time looking into this and know why this is happening: In your case you only specified a host_path in your devices configuration. But when talking with the docker server, we have to deliver host_path, container_path and permissions. So we set those with default values in the background. But those values are never stored in the state - and that's why you get a diff.

Easiest fix (for me): You add container_path and permissions to your terraform code and you should be good to go

Proper fix: I fix the implementation so that everything gets stored in the state properly. I already spent some time looking into this, it is not as easy as it seems.

So, for now, please just adapt your terraform code

Junkern avatar Apr 18 '25 09:04 Junkern

Thanks, @Junkern for your time and efforts troubleshooting this issue.

acolpan avatar Apr 18 '25 18:04 acolpan

it seems that since v3.1 or v3.2 of this plugin the default value of network_mode changed to bridge when deploying from linux via ssh to a windows docker node.

Acquiring state lock. This may take a few moments...
module.docker.docker_container.docuum: Destroying... [id=4f4b39e8ea0f94d8ed58e68fbf557112b040cacf27f1dba4f5dedf217f621088]
module.docker.docker_container.docuum: Destruction complete after 1s
module.docker.docker_image.docuum: Destroying... [id=sha256:864f50895bc2674006e46a87d1a876c2bd1f5aff29f91601a91b2fd78868d198ghcr.visualon.de/visualon/docuum:0.25.0@sha256:006682d1fdb7d16dfd1a66e6eea26a99e13a597e3952dd3b021674579bf18c3b]
module.docker.docker_image.docuum: Destruction complete after 2s
module.docker.docker_image.docuum: Creating...
module.docker.docker_image.docuum: Still creating... [10s elapsed]
module.docker.docker_image.docuum: Still creating... [20s elapsed]
module.docker.docker_image.docuum: Still creating... [30s elapsed]
module.docker.docker_image.docuum: Creation complete after 35s [id=sha256:84e776477d2e83b744276370b8e339d884b600334975993c5bda5ee8b21ca4ddghcr.visualon.de/visualon/docuum:0.25.0@sha256:0ffa6174581b49d6886a807cac5f7477c70813d17696ef116b1357885b0c0f31]
module.docker.docker_container.docuum: Creating...
╷
│ Error: Unable to start container: Error response from daemon: network bridge not found
│ 
│   with module.docker.docker_container.docuum,
│   on ../modules/docker-dev-windows/docuum.tf line 10, in resource "docker_container" "docuum":
│   10: resource "docker_container" "docuum" {
│ 
╵
Releasing state lock. This may take a few moments...

viceice avatar Apr 28 '25 14:04 viceice

@viceice do you mind opening a new issue for this? That would help me keep track of everything. Thank you!

Junkern avatar Apr 28 '25 19:04 Junkern

@viceice do you mind opening a new issue for this? That would help me keep track of everything. Thank you!

  • #723

viceice avatar May 09 '25 07:05 viceice