gluetun
gluetun copied to clipboard
Bug: Connectivity is lost once gluetun container is restarted
Is this urgent?: No (kinda it is, since this causes complete connection loss if this "bug" happens)
Host OS: Tested on both Fedora 34 and (up-to-date) Arch Linux ARM (32bit/RPi 4B)
CPU arch or device name: amd64 & armv7
What VPN provider are you using: NordVPN
What are you using to run your container?: Docker Compose
What is the version of the program
x64 & armv7: Running version latest built on 2021-09-23T17:23:28Z (commit 985cf7b)
Steps to reproduce issue:
- Using recommended docker-compose.yml, configure gluetun and another container (in my case, xyz, though it can be something like qbittorrent or whatever you want) to use gluetun's network stack. Publish xyz' ports through gluetun's network stack.
- Either: a) restart gluetun using good ol' docker restart gluetun, or b) manually cause a temporary network problem in such way that gluetun container dies/exits. Then restart gluetun.
- Now try to use xyz through its published ports: you'll receive a connection refused error, unless you restart xyz service again. You can also
-exec it
into the container and run curl/wget/ping/etc:
Expected behavior: xyz should have internet connectivity through gluetun's network stack and be accesible through gluetun's published/exposed ports, even if gluetun is restarted. This is, unfortunately not the case: xyz's network stack just dies, no data in, no data out.
Additional notes:
- I did use
FIREWALL_OUTBOUND_SUBNETS
- didn't make a difference. - I noticed quite interesting stuff once gluetun is restarted: a) Routing entries from containers using
network_mode: service:gluetun
completely disappear. b) Restarting gluetun doesn't bring back original routing tables. c)NetworkMode
seems to be okay.
Terminal example
# At this point, gluetun has been manually restarted. Then I exec -it'd into an affected container that was using gluetun's network stack:
/app # ip ro sh
/app #
[root@fedora pepe]# docker restart xyz
[root@fedora pepe]# docker exec -it xyz /bin/sh
/app # ip ro sh
0.0.0.0/1 via 10.8.1.1 dev tun0
default via 172.17.0.1 dev eth0
10.8.1.0/24 dev tun0 scope link src 10.8.1.4
37.120.209.219 via 172.17.0.1 dev eth0
128.0.0.0/1 via 10.8.1.1 dev tun0
172.17.0.0/16 dev eth0 scope link src 172.17.0.2
Brief docker inspect
output from affected container
# snip
"NetworkMode": "container:f77af999d9de92af66094dd9db0f854f1a2da9ceabddc47239bc5b89f577247f",
"PortBindings": {},
"RestartPolicy": {
"Name": "unless-stopped",
"MaximumRetryCount": 0
},
f77[...] is gluetun's container ID.
Full gluetun logs:
2021/09/24 16:39:47 INFO Alpine version: 3.14.2
2021/09/24 16:39:47 INFO OpenVPN 2.4 version: 2.4.11
2021/09/24 16:39:47 INFO OpenVPN 2.5 version: 2.5.2
2021/09/24 16:39:47 INFO Unbound version: 1.13.2
2021/09/24 16:39:47 INFO IPtables version: v1.8.7
2021/09/24 16:39:47 INFO Settings summary below:
|--VPN:
|--Type: openvpn
|--OpenVPN:
|--Version: 2.5
|--Verbosity level: 1
|--Network interface: tun0
|--Run as root: enabled
|--Nordvpn settings:
|--Regions: mexico, sweden
|--OpenVPN selection:
|--Protocol: udp
|--DNS:
|--Plaintext address: 1.1.1.1
|--DNS over TLS:
|--Unbound:
|--DNS over TLS providers:
|--Cloudflare
|--Listening port: 53
|--Access control:
|--Allowed:
|--0.0.0.0/0
|--::/0
|--Caching: enabled
|--IPv4 resolution: enabled
|--IPv6 resolution: disabled
|--Verbosity level: 1/5
|--Verbosity details level: 0/4
|--Validation log level: 0/2
|--Username:
|--Blacklist:
|--Blocked categories: malicious
|--Additional IP networks blocked: 13
|--Update: every 24h0m0s
|--Firewall:
|--Outbound subnets: 192.168.0.0/24
|--Log:
|--Level: INFO
|--System:
|--Process user ID: 1000
|--Process group ID: 1000
|--Timezone: REDACTED
|--Health:
|--Server address: 127.0.0.1:9999
|--Address to ping: github.com
|--VPN:
|--Initial duration: 6s
|--Addition duration: 5s
|--HTTP control server:
|--Listening port: 8000
|--Logging: enabled
|--Public IP getter:
|--Fetch period: 12h0m0s
|--IP file: /tmp/gluetun/ip
|--Github version information: enabled
2021/09/24 16:39:47 INFO routing: default route found: interface eth0, gateway 172.17.0.1
2021/09/24 16:39:47 INFO routing: local ethernet link found: eth0
2021/09/24 16:39:47 INFO routing: local ipnet found: 172.17.0.0/16
2021/09/24 16:39:47 INFO routing: default route found: interface eth0, gateway 172.17.0.1
2021/09/24 16:39:47 INFO routing: adding route for 0.0.0.0/0
2021/09/24 16:39:47 INFO firewall: firewall disabled, only updating allowed subnets internal list
2021/09/24 16:39:47 INFO routing: default route found: interface eth0, gateway 172.17.0.1
2021/09/24 16:39:47 INFO routing: adding route for 192.168.0.0/24
2021/09/24 16:39:47 INFO TUN device is not available: open /dev/net/tun: no such file or directory; creating it...
2021/09/24 16:39:47 INFO firewall: enabling...
2021/09/24 16:39:47 INFO firewall: enabled successfully
2021/09/24 16:39:47 INFO dns over tls: using plaintext DNS at address 1.1.1.1
2021/09/24 16:39:47 INFO healthcheck: listening on 127.0.0.1:9999
2021/09/24 16:39:47 INFO http server: listening on :8000
2021/09/24 16:39:47 INFO firewall: setting VPN connection through firewall...
2021/09/24 16:39:47 INFO openvpn: OpenVPN 2.5.2 armv7-alpine-linux-musleabihf [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on May 4 2021
2021/09/24 16:39:47 INFO openvpn: library versions: OpenSSL 1.1.1l 24 Aug 2021, LZO 2.10
2021/09/24 16:39:47 INFO openvpn: TCP/UDP: Preserving recently used remote address: [AF_INET]86.106.103.27:1194
2021/09/24 16:39:47 INFO openvpn: UDP link local: (not bound)
2021/09/24 16:39:47 INFO openvpn: UDP link remote: [AF_INET]86.106.103.27:1194
2021/09/24 16:39:48 WARN openvpn: 'link-mtu' is used inconsistently, local='link-mtu 1633', remote='link-mtu 1634'
2021/09/24 16:39:48 WARN openvpn: 'comp-lzo' is present in remote config but missing in local config, remote='comp-lzo'
2021/09/24 16:39:48 INFO openvpn: [se-nl8.nordvpn.com] Peer Connection Initiated with [AF_INET]86.106.103.27:1194
2021/09/24 16:39:49 INFO openvpn: TUN/TAP device tun0 opened
2021/09/24 16:39:49 INFO openvpn: /sbin/ip link set dev tun0 up mtu 1500
2021/09/24 16:39:49 INFO openvpn: /sbin/ip link set dev tun0 up
2021/09/24 16:39:49 INFO openvpn: /sbin/ip addr add dev tun0 10.8.8.14/24
2021/09/24 16:39:49 INFO openvpn: Initialization Sequence Completed
2021/09/24 16:39:49 INFO dns over tls: downloading DNS over TLS cryptographic files
2021/09/24 16:39:50 INFO healthcheck: healthy!
2021/09/24 16:39:53 INFO dns over tls: downloading hostnames and IP block lists
2021/09/24 16:40:11 INFO dns over tls: init module 0: validator
2021/09/24 16:40:11 INFO dns over tls: init module 1: iterator
2021/09/24 16:40:11 INFO dns over tls: start of service (unbound 1.13.2).
2021/09/24 16:40:12 INFO dns over tls: generate keytag query _ta-4a5c-4f66. NULL IN
2021/09/24 16:40:13 INFO dns over tls: generate keytag query _ta-4a5c-4f66. NULL IN
2021/09/24 16:40:16 INFO dns over tls: ready
2021/09/24 16:40:18 INFO vpn: You are running on the bleeding edge of latest!
2021/09/24 16:40:19 INFO ip getter: Public IP address is 213.232.87.176 (Netherlands, North Holland, Amsterdam)
docker-compose.yml:
gluetun:
image: qmcgaw/gluetun
container_name: gluetun
restart: unless-stopped
cap_add:
- NET_ADMIN
ports:
- 4533:4533 #navidrome
environment:
- OPENVPN_USER=REDACTED
- OPENVPN_PASSWORD=REDACTED
- VPNSP=nordvpn
- VPN_TYPE=openvpn
- REGION=REDACTED
- TZ=REDACTED
- FIREWALL_OUTBOUND_SUBNETS=192.168.0.0/24
# navidrome (can be literally anything else)
navidrome:
image: deluan/navidrome:develop
container_name: navidrome
restart: unless-stopped
environment:
- PGID=1000
- PUID=1000
volumes:
- dockervolume:/music:ro
network_mode: service:gluetun
depends_on:
- gluetun
Nonetheless I'd like to thank you for creating gluetun. I'd be more than happy to help you fix this issue if this is a gluetun bug. Hopefully it's a misconfiguration in my side.
Hey there! Thanks for the detailed issue!
It is a well known Docker problem I need to workaround. Let's keep this opened for now although there is at least one duplicate issue about this problem somewhere in the issues.
Note this only happens if gluetun is updated and uses a different image (afaik).
For now, you might want to have all your gluetun and connected containers in a single docker-compose.yml and docker-compose down && docker-compose up -d
them (what I do).
I'm developing https://github.com/qdm12/deunhealth and should add a feature tailored for this problem soon (give it 1-5 days), feel free to subscribe to releases on that side repo. That way it would watch your containers and restart your connected containers if gluetun gets updated & restarted.
Thank you for the answer @qdm12.
It does seem to be indeed a Docker problem just as you said and unfortunately they seem a bit reluctant to discuss possible solutions for the issue, unfortunately. :(
For the time being, there's a temporary ugly, brutal, but 100% working fix. Maybe it would be worth mentioning it in the wiki/docker-compose.yml example? Although there are some gotchas, since it completely replaces the original healthcheck command, and some images don't include either curl or wget. Currently I'm probing example.com every minute on child containers attached to gluetun's network stack and so far so good.
I just subscribed to deunhealth, seems promising and probably even better than things like autoheal due to the network fix thing. I'll make sure to check it out in a week (or earlier, as you deem appropiate) and provide feedback/do some testing.
Similar conversation in #504 to be concluded.
I have the same thing, when i restart Gluetun, it doesn't want to start the containers within the same network_mode. Only difference is that i configured it with: network_mode: 'container:VPN'.
I think when i restart or recreate the Gluetun container it gets a different ID.
What would be the solution to this problem?
Stumbled across this issue while researching ways to restart dependent containers once gluetun is recreated with a new image (via Watchtower). https://github.com/qdm12/deunhealth seems like it might work, but I wanted to make sure I understand the use case.
If I have a number of services with: network_mode: container:gluetun
However, when the gluetun container restarts, the dependent containers don't actually end up gettin marked unhealthy, they just lose connectivity.
I'm wondering if you've updated deunhealth yet to include this function.
No sorry, but I'll get to it soon.
Ideally, there is a way to re-attach the disconnected containers to gluetun without restarting them (I guess with Docker's Go API since I doubt the docker cli supports such thing). That would work by marking each connected container with a label to indicate this network re-attachment.
If there isn't, I'll setup something to cascade the restart from gluetun to connected containers, probably using labels to avoid any surprise (mark gluetun as a parent container with a unique id, and mark all connected containers as child containers with that same id).
For the time being, if anyone wants a dirty, cheap solution, here's my current setup:
autoheal:
... snip ...
literallyanything:
image: blahblah
container_name: blahblah
network_mode: service:gluetun
restart: unless-stopped
healthcheck:
test: "curl -sf https://example.com || exit 1"
interval: 1m
timeout: 10s
retries: 1
This will only work with containers where curl is already preinstalled. There are docker images that include wget but not curl, in which case you can replace test command with wget --no-verbose --tries=1 --spider https://example.com/ || exit 1
. You can also use qdm12's deunhealth instead of autoheal.
Any progress or resolution to this, either in gluetun or deunhealth?
I have bits and pieces for it, but I am moving country + visiting family + starting a new job right now, so it might take at least 2 weeks for me to finish it up, sorry about that. But it's at the top of my OSS things-to-do list, so it won't be forgotten :wink:
I'd also like to thank you for creating gluetun and to say this is a very good project. Any progress on this?
Any update on this by any chance?
I'm not really sure. I turned off the Watchtower container and since then my setup worked flawlessly. It's a workaround, but it's all I know so far.
Op di 7 mrt 2023 om 01:25 schreef Paul Hawkins @.***>:
Any update on this by any chance?
— Reply to this email directly, view it on GitHub https://github.com/qdm12/gluetun/issues/641#issuecomment-1457265377, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIU3IYFQ7MQ6JKOGY2H32TW2Z57BANCNFSM5EW2LD4Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Any news or progress on this issue?
following
Since I also have this problem, I would like to report it here and find out if and how it continues. Thank you!
Having the same behavior. When gluetun is recreated, every other container in the same network_mode needs to also restart
Stupid idea to solve this:
Gluetun is given the docker socket, and a list of containers to restart once it comes back up?
I do not think Docker is going to solve this bug. This bug has existed effectively forever.
Welp, had to prove myself wrong: https://github.com/docker/compose/pull/10284
This solution has worked well for me: https://github.com/qdm12/gluetun/issues/641#issuecomment-933856220. I've not had any issues with container connectivity after adding that. Seems like an all right fix to me.
Following, I have this issue. Thank you for the awesome service.
@ismay The workaround wasn't bad, but doesn't work for distroless containers which don't have curl or wget or really anything to do a healthcheck from the inside
@ismay The workaround wasn't bad, but doesn't work for distroless containers which don't have curl or wget or really anything to do a healthcheck from the inside
Ah yeah that's true
Is there a chance this issue will be ever resolved? Thanks for providing gluetun, happy user for a while.
The "network_mode: service:gluetun" statement is incredibly different from a normal "networks" statement.
People's hopes and some's expectations here are a bit overblown (and the "bug" monicker, likely doesn't help) ... if you use a container as a network stack component (router with layer features) ... which you are with that configuration ... and you restart that container its state, and more importantly infrastructure, is gone. Your application isn't going to recover.
The author may be able to provide runtime resilience in the container or even some high availability (see other PR where people want to have multiple gluetuns concurrently), but if wireguard or openvpn were to fall down all he can do is try and reestablish. All the states, port forwards, etc are all gone and will need to be reconstructed.
For example, complaining about connection refused while a router is offline (in this case its ... off) ... yeah, not only is that vpn not running anymore the whole stack (eg, network service) is not running. Let's be realistic.
With "network_mode" ... your container's networking is using another container for its networking ... that also provides additional network services (eg, vpn, proxies, dns, etc).
This gimmick of using the docker feature for easily aggregating containers into that network stack ... great, but this is the downside ... they are all aggregated into that stack.
If you filed "bugs" with docker such as "network_mode services should retain connections and state while service container offline and ..." it's not feasible ... in a single container. And docker plugins (an entirely different thing) have gone into different roadmaps.
Some events, like a vpn falling down, not causing the container to fully restart (and then all the aggregated containers) ... that is likely a resolvable thing.
This is a bug with docker and how docker handles network_mode:container
. Until they fix that bug, we're basically stuck.
If anybody wants to give it a try, I have written cascandaliato/docker-restarter
. Right now it covers only one scenario: if A depends on B and B restarts then restart A
.
Gluetun updated twice in a few minutes, which in turn i had to restart the whole stack depending on it, twice.
@cascandaliato I will try for sure.
If anybody wants to give it a try, I have written
cascandaliato/docker-restarter
. Right now it covers only one scenario:if A depends on B and B restarts then restart A
.
That works a treat. Thank you.
I did find I had to put it outside of the stack. Your container would attempt to restart itself. But actually stop if it was inside the stack, and not restart the other container's. Possibly a issue from my side thought
@Blavkentropy1, I've opened cascandaliato/docker-restarter#2 to keep track of that problem but I'll need your help to understand what's happening. Whenever you have time, we can continue the discussion in the other issue because I don't want to add noise to this one.
Also having this issue sadly!
Drove myself mad thinking it was a problem with defining FIREWALL_OUTBOUND_SUBNETS.