watchtower
watchtower copied to clipboard
Watchtower Fails Recreating Containers that Share a Network Stack
Hello! I have a VPN container and torrent container that share a network stack (torrent container has --network container:PIA). I'm noticing that whenever something happens to the "parent" VPN container that the torrent container errors out. This can come in a few different ways:
- Both are re-created
In this case, it seems that the torrent container is created before the VPN container:
time="2022-05-11T02:43:54Z" level=info msg="Found new linuxserver/qbittorrent:latest image (e3ce01e9d9f9)"
time="2022-05-11T02:44:17Z" level=info msg="Found new linuxserver/sonarr:latest image (e4571e1de8bb)"
time="2022-05-11T02:44:33Z" level=info msg="Found new linuxserver/jackett:latest image (9209598da4bc)"
time="2022-05-11T02:44:39Z" level=info msg="Found new qmcgaw/gluetun:latest image (dc68aaf68f41)"
time="2022-05-11T02:45:04Z" level=info msg="Found new linuxserver/unifi-controller:latest image (b6f250fead08)"
time="2022-05-11T02:45:08Z" level=info msg="Found new guacamole/guacamole:latest image (772b60108cca)"
time="2022-05-11T02:45:15Z" level=info msg="Found new guacamole/guacd:latest image (4969201c0757)"
time="2022-05-11T02:45:17Z" level=info msg="Found new itzg/minecraft-server:latest image (f7c85977cfb8)"
time="2022-05-11T02:45:20Z" level=info msg="Stopping /Guacd (11395aded9fc) with SIGTERM"
time="2022-05-11T02:45:30Z" level=info msg="Stopping /Guac (4401e75cc7cf) with SIGTERM"
time="2022-05-11T02:45:35Z" level=info msg="Stopping /Unifi (68e49fc241d3) with SIGTERM"
time="2022-05-11T02:45:47Z" level=info msg="Stopping /PIA (7f374bb5714f) with SIGTERM"
time="2022-05-11T02:45:49Z" level=info msg="Stopping /Jackett (dc8a5ad9dde7) with SIGTERM"
time="2022-05-11T02:45:53Z" level=info msg="Stopping /Sonarr (340cec5a4766) with SIGTERM"
time="2022-05-11T02:45:58Z" level=info msg="Stopping /Nginx (ec102dfb297d) with SIGTERM"
time="2022-05-11T02:46:08Z" level=info msg="Stopping /qBittorrent (9324abf693e5) with SIGTERM"
time="2022-05-11T02:46:17Z" level=info msg="Creating /qBittorrent"
time="2022-05-11T02:46:17Z" level=error msg="Error response from daemon: No such container: 7f374bb5714f1b081a6da007fea491d4d8eb586ee6541be406b9dbb666bcabef"
time="2022-05-11T02:46:17Z" level=info msg="Creating /Nginx"
time="2022-05-11T02:46:18Z" level=info msg="Creating /Sonarr"
time="2022-05-11T02:46:19Z" level=info msg="Creating /Jackett"
time="2022-05-11T02:46:21Z" level=info msg="Creating /PIA"
time="2022-05-11T02:46:22Z" level=info msg="Creating /Unifi"
time="2022-05-11T02:46:24Z" level=info msg="Creating /Guac"
time="2022-05-11T02:46:25Z" level=info msg="Creating /Guacd"
time="2022-05-11T02:46:27Z" level=info msg="Creating /Minecraft"
time="2022-05-11T02:46:27Z" level=info msg="Session done" Failed=1 Scanned=26 Updated=7 notify=no
- Parent container is updated without the child container.
In this case the containers are recreated but the child container looses the connection to the parent container (presumably because the container hash changes).
I understand that a similar issue used to happen with the --link command but that was resolved back in March. This seems to be the same issue just with the new command (--network container:<> replaced --link).
Hi there! ππΌ As you're new to this repo, we'd like to suggest that you read our code of conduct as well as our contribution guidelines. Thanks a bunch for opening your first issue! π
I'm having the same issue and cannot get watchtower to stop and start linked containers in the correct order. I've tried:
- Not putting any watchtower-specific labels in the containers config and just relied on compose syntax like this:
depends_on:
gluetun:
condition: service_healthy
- Using the watchtower label com.centurylinklabs.watchtower.depends-on: parent in each child container
- Using the watchtower label com.centurylinklabs.watchtower.depends-on: "child1,child2" in the parent container
In each case, watchtower correctly identifies that there are linked containers but still shuts the parent down first, then the children, then starts up the children (which error out because of the missing parent), then the parent and then deletes all the dangling images which now includes the non-running child images.
Edit: actually I did have some improvement using no. 2 setup above (depends on parent in each child) and using a leading slash in the container name (so label is com.centurylinklabs.watchtower.depends-on: "/gluetun") results in the correct shutdown and startup order.
However, the child containers failed to start with the image not found error. Strangely this occurs before watchtower removes all the dangling images so I'm not entirely sure what the problem is. I'm re-trying with WATCHTOWER_CLEANUP set to false and if that doesn't work, I'll just turn off auto-updates for Gluetun.
Same thing here, and also with gluetun. I disabled watchtower on those containers for now.
Same here.
It looks like it's still looking for the old container hash after the new container is created.
Even if we link the containers via com.centurylinklabs.watchtower.depends-on
label.
watchtower | time="2022-07-06T22:34:30+02:00" level=debug msg="container is linked to restarting" linked=/traefik restarting=/cloudflared
watchtower | time="2022-07-06T22:34:30+02:00" level=debug msg="container is linked to restarting" linked=/jaeger restarting=/cloudflared
watchtower | time="2022-07-06T22:34:30+02:00" level=debug msg="This is the watchtower container /watchtower"
watchtower | time="2022-07-06T22:34:41+02:00" level=info msg="Stopping /jaeger (8e6c7b35d73b) with SIGTERM"
watchtower | time="2022-07-06T22:34:42+02:00" level=debug msg="Removing container 8e6c7b35d73b"
watchtower | time="2022-07-06T22:34:48+02:00" level=info msg="Stopping /traefik (b2fbecebe9e0) with SIGTERM"
watchtower | time="2022-07-06T22:34:50+02:00" level=debug msg="Removing container b2fbecebe9e0"
watchtower | time="2022-07-06T22:34:50+02:00" level=info msg="Stopping /cloudflared (56e4448aee54) with SIGTERM"
watchtower | time="2022-07-06T22:34:51+02:00" level=debug msg="Removing container 56e4448aee54"
watchtower | time="2022-07-06T22:34:58+02:00" level=info msg="Creating /cloudflared"
watchtower | time="2022-07-06T22:34:58+02:00" level=debug msg="Starting container /cloudflared (a2a50d353447)"
watchtower | time="2022-07-06T22:34:58+02:00" level=info msg="Creating /traefik"
watchtower | time="2022-07-06T22:34:58+02:00" level=debug msg="Starting container /traefik (357ab2814dcb)"
watchtower | time="2022-07-06T22:34:58+02:00" level=error msg="Error response from daemon: No such container: 56e4448aee54525c60a2167dec01bb0ee371abb46859c17d8f89d99fbba84574"
watchtower | time="2022-07-06T22:34:59+02:00" level=info msg="Creating /jaeger"
watchtower | time="2022-07-06T22:34:59+02:00" level=debug msg="Starting container /jaeger (7f3a418b9346)"
watchtower | time="2022-07-06T22:34:59+02:00" level=error msg="Error response from daemon: No such container: 56e4448aee54525c60a2167dec01bb0ee371abb46859c17d8f89d99fbba84574"
Maybe there is a reference to the old container in the config somewhere. Could you post a docker inspect
of the jaeger container?
The deepends-on
just recreates the containers when one of their dependencies are recreated. If there is some kind of explicit reference to the container ID in their config, it has to be updated as well. Perhaps it's added in the network config? Are you using docker compose?
Maybe there is a reference to the old container in the config somewhere. Could you post a
docker inspect
of the jaeger container? Thedeepends-on
just recreates the containers when one of their dependencies are recreated. If there is some kind of explicit reference to the container ID in their config, it has to be updated as well. Perhaps it's added in the network config? Are you using docker compose?
I'm facing exactly what @ljo123 described. Since I posted my logs, I already recreated the jaeger container. So, itβs hash won't match the log I sent earlier.
docker inspect:
[
{
"Id": "7f89de80221494c2fdca6dca286b15eedb3c9af975a51039304528345b63cc2b",
"Created": "2022-07-08T09:08:08.548100679Z",
"Path": "/go/bin/all-in-one-linux",
"Args": [],
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 2243640,
"ExitCode": 0,
"Error": "",
"StartedAt": "2022-07-08T09:08:08.878087076Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
"Image": "sha256:5011eb6cadf176aa8ca70812a17499e132b985bc203b4e5d566976943cd1eca0",
"ResolvConfPath": "/var/lib/docker/containers/02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2/resolv.conf",
"HostnamePath": "/var/lib/docker/containers/02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2/hostname",
"HostsPath": "/var/lib/docker/containers/02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2/hosts",
"LogPath": "/var/lib/docker/containers/7f89de80221494c2fdca6dca286b15eedb3c9af975a51039304528345b63cc2b/7f89de80221494c2fdca6dca286b15eedb3c9af975a51039304528345b63cc2b-json.log",
"Name": "/jaeger",
"RestartCount": 0,
"Driver": "overlay2",
"Platform": "linux",
"MountLabel": "",
"ProcessLabel": "",
"AppArmorProfile": "docker-default",
"ExecIDs": null,
"HostConfig": {
"Binds": [],
"ContainerIDFile": "",
"LogConfig": {
"Type": "json-file",
"Config": {}
},
"NetworkMode": "container:02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2",
"PortBindings": {},
"RestartPolicy": {
"Name": "unless-stopped",
"MaximumRetryCount": 0
},
"AutoRemove": false,
"VolumeDriver": "",
"VolumesFrom": [],
"CapAdd": null,
"CapDrop": null,
"CgroupnsMode": "host",
"Dns": null,
"DnsOptions": null,
"DnsSearch": null,
"ExtraHosts": null,
"GroupAdd": null,
"IpcMode": "private",
"Cgroup": "",
"Links": null,
"OomScoreAdj": 0,
"PidMode": "",
"Privileged": false,
"PublishAllPorts": false,
"ReadonlyRootfs": false,
"SecurityOpt": null,
"UTSMode": "",
"UsernsMode": "",
"ShmSize": 67108864,
"Runtime": "runc",
"ConsoleSize": [
0,
0
],
"Isolation": "",
"CpuShares": 0,
"Memory": 0,
"NanoCpus": 0,
"CgroupParent": "",
"BlkioWeight": 0,
"BlkioWeightDevice": null,
"BlkioDeviceReadBps": null,
"BlkioDeviceWriteBps": null,
"BlkioDeviceReadIOps": null,
"BlkioDeviceWriteIOps": null,
"CpuPeriod": 0,
"CpuQuota": 0,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": null,
"DeviceCgroupRules": null,
"DeviceRequests": null,
"KernelMemory": 0,
"KernelMemoryTCP": 0,
"MemoryReservation": 0,
"MemorySwap": 0,
"MemorySwappiness": null,
"OomKillDisable": false,
"PidsLimit": null,
"Ulimits": null,
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0,
"MaskedPaths": [
"/proc/asound",
"/proc/acpi",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/proc/scsi",
"/sys/firmware"
],
"ReadonlyPaths": [
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
},
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b-init/diff:/var/lib/docker/overlay2/70a39c423ebe007a3cc2e1a2c3fb1c9d60cfc9c3117ac9e3fd50cedc434d1da0/diff:/var/lib/docker/overlay2/35f1782638eaff2f56e39677c3d11f69841decee009d501d6463f6be8873605f/diff:/var/lib/docker/overlay2/28f4b48752d7a3f77cd689c3390463ffb8cad62e8e24904bbde0faf370b4aa28/diff:/var/lib/docker/overlay2/5eedcaf13c27e39123b0277274d15e0d920d152810ae1a959299a0a874e42e1b/diff:/var/lib/docker/overlay2/f98556203bf805f7592608d96e10d84862bf42840852459af978abcdbcd80cfc/diff",
"MergedDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b/merged",
"UpperDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b/diff",
"WorkDir": "/var/lib/docker/overlay2/2db8b3f5fb10b872c5bfbceeaa0c0369649ab716cc5fd26829ad928009ad9d2b/work"
},
"Name": "overlay2"
},
"Mounts": [
{
"Type": "volume",
"Name": "5a99f2e145e0e426dae0b6a3d56bce224459cd7c809d24408eb6244c5f75e134",
"Source": "/var/lib/docker/volumes/5a99f2e145e0e426dae0b6a3d56bce224459cd7c809d24408eb6244c5f75e134/_data",
"Destination": "/tmp",
"Driver": "local",
"Mode": "",
"RW": true,
"Propagation": ""
}
],
"Config": {
"Hostname": "02cea43ed47c",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"14250/tcp": {},
"14268/tcp": {},
"16686/tcp": {},
"5775/udp": {},
"5778/tcp": {},
"6831/udp": {},
"6832/udp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"SAMPLING_STRATEGIES_FILE=/etc/jaeger/sampling_strategies.json"
],
"Cmd": null,
"Image": "jaegertracing/all-in-one",
"Volumes": {
"/tmp": {}
},
"WorkingDir": "",
"Entrypoint": [
"/go/bin/all-in-one-linux"
],
"OnBuild": null,
"Labels": {
"com.centurylinklabs.watchtower.depends-on": "/cloudflared",
"com.docker.compose.config-hash": "c2b318c7497a806ecdf583ead8cc28d591ad6de32393fb9786f20e7aff6bf188",
"com.docker.compose.container-number": "1",
"com.docker.compose.oneoff": "False",
"com.docker.compose.project": "user",
"com.docker.compose.project.config_files": "docker-compose.yml",
"com.docker.compose.project.working_dir": "/home/user",
"com.docker.compose.service": "jaeger",
"com.docker.compose.version": "1.29.2",
"traefik.enable": "True",
"traefik.http.middlewares.jaegerauth.basicauth.users": "secret:secret",
"traefik.http.routers.jaeger.middlewares": "jaegerauth@docker",
"traefik.http.routers.jaeger.rule": "Host(`jaeger.secret.com`)",
"traefik.http.services.jaeger.loadbalancer.server.port": "16686"
}
},
"NetworkSettings": {
"Bridge": "",
"SandboxID": "",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {},
"SandboxKey": "",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "",
"Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"MacAddress": "",
"Networks": {}
}
}
]
docker-compose
version: "3.8"
services:
cloudflared:
image: cloudflare/cloudflared
container_name: cloudflared
command:
- tunnel
- --url=http://localhost:80
- run
- --token=secret
extra_hosts:
- host.docker.internal:172.177.0.1
restart: unless-stopped
jaeger:
image: jaegertracing/all-in-one
container_name: jaeger
# ports:
# - 16686:16686
network_mode: service:cloudflared
labels:
com.centurylinklabs.watchtower.depends-on: /cloudflared
traefik.enable: true
traefik.http.middlewares.jaegerauth.basicauth.users: secret:secret
traefik.http.routers.jaeger.middlewares: jaegerauth@docker
traefik.http.routers.jaeger.rule: Host(`jaeger.secret.com`)
traefik.http.services.jaeger.loadbalancer.server.port: 16686
restart: unless-stopped
traefik:
image: traefik
container_name: traefik
command:
- --api.dashboard
- --entrypoints.web.address=:80
- --entryPoints.web.forwardedHeaders.trustedIPs=127.0.0.1/32
- --experimental.hub=true
- --global.checkNewVersion=true
- --hub.tls.insecure=true
# - --log.level=DEBUG
- --metrics.prometheus.addrouterslabels=true
- --providers.docker
- --providers.docker.exposedbydefault=false
- --tracing.jaeger=true
# ports:
# - 8080:8080
volumes:
- /run/docker.sock:/var/run/docker.sock:ro
network_mode: service:cloudflared
depends_on:
- jaeger
labels:
com.centurylinklabs.watchtower.depends-on: /cloudflared
traefik.enable: true
traefik.http.middlewares.traefikauth.basicauth.users: secret:secret
traefik.http.routers.traefik.middlewares: traefikauth@docker
traefik.http.routers.traefik.rule: Host(`traefik.secret.com`)
traefik.http.routers.traefik.service: api@internal
traefik.http.services.traefik.loadbalancer.server.port: 8080
restart: unless-stopped
watchtower:
image: containrrr/watchtower
container_name: watchtower
command:
- --cleanup
- --debug
- --include-restarting
- --include-stopped
- --remove-volumes
- --trace
volumes:
- /etc/localtime:/etc/localtime:ro
- /run/docker.sock:/var/run/docker.sock
restart: unless-stopped
networks:
default:
ipam:
config:
- subnet: 172.177.0.0/16
Yeah, that's exactly what I suspected:
"NetworkMode": "container:02cea43ed47cd6d0ea6ebaecfc889cf01623712fe846ca3c3cad1e94c3ed8ce2",
That means that network_mode: service:*
is not supported right now. It should be possible to both support it and infer the depends-on
from the property though.
Note: NetworkMode: container:CONTAINER_NAME
would still work, but docker-compose
puts the explicit container ID in the field instead :/
Any workaround for this short of adding monitor-only to the parent container and manually updating the stack periodically? It would be nice if watchtower could redeploy a whole stack if the parent container needed an update...
The only workaround is to use another networking mode afaik.
It would be nice, a PR is welcomed!