watchtower icon indicating copy to clipboard operation
watchtower copied to clipboard

Watchtower Fails Recreating Containers that Share a Network Stack

Open MeCJay12 opened this issue 2 years ago β€’ 9 comments

Hello! I have a VPN container and torrent container that share a network stack (torrent container has --network container:PIA). I'm noticing that whenever something happens to the "parent" VPN container that the torrent container errors out. This can come in a few different ways:

  1. Both are re-created

In this case, it seems that the torrent container is created before the VPN container:

time="2022-05-11T02:43:54Z" level=info msg="Found new linuxserver/qbittorrent:latest image (e3ce01e9d9f9)"
time="2022-05-11T02:44:17Z" level=info msg="Found new linuxserver/sonarr:latest image (e4571e1de8bb)"
time="2022-05-11T02:44:33Z" level=info msg="Found new linuxserver/jackett:latest image (9209598da4bc)"
time="2022-05-11T02:44:39Z" level=info msg="Found new qmcgaw/gluetun:latest image (dc68aaf68f41)"
time="2022-05-11T02:45:04Z" level=info msg="Found new linuxserver/unifi-controller:latest image (b6f250fead08)"
time="2022-05-11T02:45:08Z" level=info msg="Found new guacamole/guacamole:latest image (772b60108cca)"
time="2022-05-11T02:45:15Z" level=info msg="Found new guacamole/guacd:latest image (4969201c0757)"
time="2022-05-11T02:45:17Z" level=info msg="Found new itzg/minecraft-server:latest image (f7c85977cfb8)"
time="2022-05-11T02:45:20Z" level=info msg="Stopping /Guacd (11395aded9fc) with SIGTERM"
time="2022-05-11T02:45:30Z" level=info msg="Stopping /Guac (4401e75cc7cf) with SIGTERM"
time="2022-05-11T02:45:35Z" level=info msg="Stopping /Unifi (68e49fc241d3) with SIGTERM"
time="2022-05-11T02:45:47Z" level=info msg="Stopping /PIA (7f374bb5714f) with SIGTERM"
time="2022-05-11T02:45:49Z" level=info msg="Stopping /Jackett (dc8a5ad9dde7) with SIGTERM"
time="2022-05-11T02:45:53Z" level=info msg="Stopping /Sonarr (340cec5a4766) with SIGTERM"
time="2022-05-11T02:45:58Z" level=info msg="Stopping /Nginx (ec102dfb297d) with SIGTERM"
time="2022-05-11T02:46:08Z" level=info msg="Stopping /qBittorrent (9324abf693e5) with SIGTERM"
time="2022-05-11T02:46:17Z" level=info msg="Creating /qBittorrent"
time="2022-05-11T02:46:17Z" level=error msg="Error response from daemon: No such container: 7f374bb5714f1b081a6da007fea491d4d8eb586ee6541be406b9dbb666bcabef"
time="2022-05-11T02:46:17Z" level=info msg="Creating /Nginx"
time="2022-05-11T02:46:18Z" level=info msg="Creating /Sonarr"
time="2022-05-11T02:46:19Z" level=info msg="Creating /Jackett"
time="2022-05-11T02:46:21Z" level=info msg="Creating /PIA"
time="2022-05-11T02:46:22Z" level=info msg="Creating /Unifi"
time="2022-05-11T02:46:24Z" level=info msg="Creating /Guac"
time="2022-05-11T02:46:25Z" level=info msg="Creating /Guacd"
time="2022-05-11T02:46:27Z" level=info msg="Creating /Minecraft"
time="2022-05-11T02:46:27Z" level=info msg="Session done" Failed=1 Scanned=26 Updated=7 notify=no
  1. Parent container is updated without the child container.

In this case the containers are recreated but the child container looses the connection to the parent container (presumably because the container hash changes).

I understand that a similar issue used to happen with the --link command but that was resolved back in March. This seems to be the same issue just with the new command (--network container:<> replaced --link).

MeCJay12 avatar May 11 '22 03:05 MeCJay12

Hi there! πŸ‘‹πŸΌ As you're new to this repo, we'd like to suggest that you read our code of conduct as well as our contribution guidelines. Thanks a bunch for opening your first issue! πŸ™

github-actions[bot] avatar May 11 '22 03:05 github-actions[bot]

I'm having the same issue and cannot get watchtower to stop and start linked containers in the correct order. I've tried:

  1. Not putting any watchtower-specific labels in the containers config and just relied on compose syntax like this:
     depends_on:
      gluetun:
        condition: service_healthy
  1. Using the watchtower label com.centurylinklabs.watchtower.depends-on: parent in each child container
  2. Using the watchtower label com.centurylinklabs.watchtower.depends-on: "child1,child2" in the parent container

In each case, watchtower correctly identifies that there are linked containers but still shuts the parent down first, then the children, then starts up the children (which error out because of the missing parent), then the parent and then deletes all the dangling images which now includes the non-running child images.

Edit: actually I did have some improvement using no. 2 setup above (depends on parent in each child) and using a leading slash in the container name (so label is com.centurylinklabs.watchtower.depends-on: "/gluetun") results in the correct shutdown and startup order.

However, the child containers failed to start with the image not found error. Strangely this occurs before watchtower removes all the dangling images so I'm not entirely sure what the problem is. I'm re-trying with WATCHTOWER_CLEANUP set to false and if that doesn't work, I'll just turn off auto-updates for Gluetun.

ljo123 avatar Jun 12 '22 12:06 ljo123

Same thing here, and also with gluetun. I disabled watchtower on those containers for now.

juanra avatar Jun 13 '22 03:06 juanra

Same here. It looks like it's still looking for the old container hash after the new container is created. Even if we link the containers via com.centurylinklabs.watchtower.depends-on label.

watchtower         | time="2022-07-06T22:34:30+02:00" level=debug msg="container is linked to restarting" linked=/traefik restarting=/cloudflared
watchtower         | time="2022-07-06T22:34:30+02:00" level=debug msg="container is linked to restarting" linked=/jaeger restarting=/cloudflared
watchtower         | time="2022-07-06T22:34:30+02:00" level=debug msg="This is the watchtower container /watchtower"
watchtower         | time="2022-07-06T22:34:41+02:00" level=info msg="Stopping /jaeger (8e6c7b35d73b) with SIGTERM"
watchtower         | time="2022-07-06T22:34:42+02:00" level=debug msg="Removing container 8e6c7b35d73b"
watchtower         | time="2022-07-06T22:34:48+02:00" level=info msg="Stopping /traefik (b2fbecebe9e0) with SIGTERM"
watchtower         | time="2022-07-06T22:34:50+02:00" level=debug msg="Removing container b2fbecebe9e0"
watchtower         | time="2022-07-06T22:34:50+02:00" level=info msg="Stopping /cloudflared (56e4448aee54) with SIGTERM"
watchtower         | time="2022-07-06T22:34:51+02:00" level=debug msg="Removing container 56e4448aee54"
watchtower         | time="2022-07-06T22:34:58+02:00" level=info msg="Creating /cloudflared"
watchtower         | time="2022-07-06T22:34:58+02:00" level=debug msg="Starting container /cloudflared (a2a50d353447)"
watchtower         | time="2022-07-06T22:34:58+02:00" level=info msg="Creating /traefik"
watchtower         | time="2022-07-06T22:34:58+02:00" level=debug msg="Starting container /traefik (357ab2814dcb)"
watchtower         | time="2022-07-06T22:34:58+02:00" level=error msg="Error response from daemon: No such container: 56e4448aee54525c60a2167dec01bb0ee371abb46859c17d8f89d99fbba84574"
watchtower         | time="2022-07-06T22:34:59+02:00" level=info msg="Creating /jaeger"
watchtower         | time="2022-07-06T22:34:59+02:00" level=debug msg="Starting container /jaeger (7f3a418b9346)"
watchtower         | time="2022-07-06T22:34:59+02:00" level=error msg="Error response from daemon: No such container: 56e4448aee54525c60a2167dec01bb0ee371abb46859c17d8f89d99fbba84574"

marcosvrs avatar Jul 07 '22 18:07 marcosvrs

Maybe there is a reference to the old container in the config somewhere. Could you post a docker inspect of the jaeger container? The deepends-on just recreates the containers when one of their dependencies are recreated. If there is some kind of explicit reference to the container ID in their config, it has to be updated as well. Perhaps it's added in the network config? Are you using docker compose?

piksel avatar Jul 08 '22 07:07 piksel

Maybe there is a reference to the old container in the config somewhere. Could you post a docker inspect of the jaeger container? The deepends-on just recreates the containers when one of their dependencies are recreated. If there is some kind of explicit reference to the container ID in their config, it has to be updated as well. Perhaps it's added in the network config? Are you using docker compose?

I'm facing exactly what @ljo123 described. Since I posted my logs, I already recreated the jaeger container. So, it’s hash won't match the log I sent earlier.

docker inspect:

[
    {
        "Id": "7f89de80221494c2fdca6dca286b15eedb3c9af975a51039304528345b63cc2b",
        "Created": "2022-07-08T09:08:08.548100679Z",
        "Path": "/go/bin/all-in-one-linux",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 2243640,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2022-07-08T09:08:08.878087076Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:5011eb6cadf176aa8ca70812a17499e132b985bc203b4e5d566976943cd1eca0",
        "ResolvConfPath": "/var/lib/docker/containers/02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2/hostname",
        "HostsPath": "/var/lib/docker/containers/02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2/hosts",
        "LogPath": "/var/lib/docker/containers/7f89de80221494c2fdca6dca286b15eedb3c9af975a51039304528345b63cc2b/7f89de80221494c2fdca6dca286b15eedb3c9af975a51039304528345b63cc2b-json.log",
        "Name": "/jaeger",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "container:02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "unless-stopped",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": [],
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "host",
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "KernelMemory": 0,
            "KernelMemoryTCP": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b-init/diff:/var/lib/docker/overlay2/70a39c423ebe007a3cc2e1a2c3fb1c9d60cfc9c3117ac9e3fd50cedc434d1da0/diff:/var/lib/docker/overlay2/35f1782638eaff2f56e39677c3d11f69841decee009d501d6463f6be8873605f/diff:/var/lib/docker/overlay2/28f4b48752d7a3f77cd689c3390463ffb8cad62e8e24904bbde0faf370b4aa28/diff:/var/lib/docker/overlay2/5eedcaf13c27e39123b0277274d15e0d920d152810ae1a959299a0a874e42e1b/diff:/var/lib/docker/overlay2/f98556203bf805f7592608d96e10d84862bf42840852459af978abcdbcd80cfc/diff",
                "MergedDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b/merged",
                "UpperDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b/diff",
                "WorkDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "volume",
                "Name": "5a99f2e145e0e426dae0b6a3d56bce224459cd7c809d24408eb6244c5f75e134",
                "Source": "/var/lib/docker/volumes/5a99f2e145e0e426dae0b6a3d56bce224459cd7c809d24408eb6244c5f75e134/_data",
                "Destination": "/tmp",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
        "Config": {
            "Hostname": "02cea43ed47c",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "14250/tcp": {},
                "14268/tcp": {},
                "16686/tcp": {},
                "5775/udp": {},
                "5778/tcp": {},
                "6831/udp": {},
                "6832/udp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "SAMPLING_STRATEGIES_FILE=/etc/jaeger/sampling_strategies.json"
            ],
            "Cmd": null,
            "Image": "jaegertracing/all-in-one",
            "Volumes": {
                "/tmp": {}
            },
            "WorkingDir": "",
            "Entrypoint": [
                "/go/bin/all-in-one-linux"
            ],
            "OnBuild": null,
            "Labels": {
                "com.centurylinklabs.watchtower.depends-on": "/cloudflared",
                "com.docker.compose.config-hash": "c2b318c7497a806ecdf583ead8cc28d591ad6de32393fb9786f20e7aff6bf188",
                "com.docker.compose.container-number": "1",
                "com.docker.compose.oneoff": "False",
                "com.docker.compose.project": "user",
                "com.docker.compose.project.config_files": "docker-compose.yml",
                "com.docker.compose.project.working_dir": "/home/user",
                "com.docker.compose.service": "jaeger",
                "com.docker.compose.version": "1.29.2",
                "traefik.enable": "True",
                "traefik.http.middlewares.jaegerauth.basicauth.users": "secret:secret",
                "traefik.http.routers.jaeger.middlewares": "jaegerauth@docker",
                "traefik.http.routers.jaeger.rule": "Host(`jaeger.secret.com`)",
                "traefik.http.services.jaeger.loadbalancer.server.port": "16686"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {}
        }
    }
]

docker-compose

version: "3.8"

services:

  cloudflared:
    image: cloudflare/cloudflared
    container_name: cloudflared
    command:
      - tunnel
      - --url=http://localhost:80
      - run
      - --token=secret
    extra_hosts:
      - host.docker.internal:172.177.0.1
    restart: unless-stopped

  jaeger:
    image: jaegertracing/all-in-one
    container_name: jaeger
    # ports:
    #   - 16686:16686
    network_mode: service:cloudflared
    labels:
      com.centurylinklabs.watchtower.depends-on: /cloudflared
      traefik.enable: true
      traefik.http.middlewares.jaegerauth.basicauth.users: secret:secret
      traefik.http.routers.jaeger.middlewares: jaegerauth@docker
      traefik.http.routers.jaeger.rule: Host(`jaeger.secret.com`)
      traefik.http.services.jaeger.loadbalancer.server.port: 16686
    restart: unless-stopped

  traefik:
    image: traefik
    container_name: traefik
    command:
      - --api.dashboard
      - --entrypoints.web.address=:80
      - --entryPoints.web.forwardedHeaders.trustedIPs=127.0.0.1/32
      - --experimental.hub=true
      - --global.checkNewVersion=true
      - --hub.tls.insecure=true
      # - --log.level=DEBUG
      - --metrics.prometheus.addrouterslabels=true
      - --providers.docker
      - --providers.docker.exposedbydefault=false
      - --tracing.jaeger=true
    # ports:
    #   - 8080:8080
    volumes:
      - /run/docker.sock:/var/run/docker.sock:ro
    network_mode: service:cloudflared
    depends_on:
      - jaeger
    labels:
      com.centurylinklabs.watchtower.depends-on: /cloudflared
      traefik.enable: true
      traefik.http.middlewares.traefikauth.basicauth.users: secret:secret
      traefik.http.routers.traefik.middlewares: traefikauth@docker
      traefik.http.routers.traefik.rule: Host(`traefik.secret.com`)
      traefik.http.routers.traefik.service: api@internal
      traefik.http.services.traefik.loadbalancer.server.port: 8080
    restart: unless-stopped

  watchtower:
    image: containrrr/watchtower
    container_name: watchtower
    command:
      - --cleanup
      - --debug
      - --include-restarting
      - --include-stopped
      - --remove-volumes
      - --trace
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /run/docker.sock:/var/run/docker.sock
    restart: unless-stopped

networks:
  default:
    ipam:
      config:
        - subnet: 172.177.0.0/16

marcosvrs avatar Jul 08 '22 09:07 marcosvrs

Yeah, that's exactly what I suspected:

"NetworkMode": "container:02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2",

That means that network_mode: service:* is not supported right now. It should be possible to both support it and infer the depends-on from the property though.

Note: NetworkMode: container:CONTAINER_NAME would still work, but docker-compose puts the explicit container ID in the field instead :/

piksel avatar Jul 08 '22 09:07 piksel

Any workaround for this short of adding monitor-only to the parent container and manually updating the stack periodically? It would be nice if watchtower could redeploy a whole stack if the parent container needed an update...

stunrelay avatar Aug 13 '22 03:08 stunrelay

The only workaround is to use another networking mode afaik.

It would be nice, a PR is welcomed!

piksel avatar Aug 14 '22 06:08 piksel