mullvadvpn-app icon indicating copy to clipboard operation
mullvadvpn-app copied to clipboard

No Internet connection after resume from sleep if mullvad-vpn is being used (Linux)

Open yaomtc opened this issue 3 years ago • 25 comments

Issue report

Operating system: Arch Linux

App version: 2021.4

Issue description

I have installed mullvad-vpn, and then mullvad-vpn-beta from the AUR on Arch. When I use standby, my Internet connection does not work upon resume, unless mullvad-vpn is disabled/not running at the time.

EDIT: Here is where I started troubleshooting my issue on the Arch Linux forums: https://bbs.archlinux.org/viewtopic.php?pid=1986412

Here is the output of ip route when the problem is occurring:

default via 192.168.0.1 dev enp3s0 proto dhcp src 192.168.0.102 metric 1024 
192.168.0.0/24 dev enp3s0 proto kernel scope link src 192.168.0.102 metric 1024 
192.168.0.1 dev enp3s0 proto dhcp scope link src 192.168.0.102 metric 1024

Here is the output normally:

default via 192.168.0.1 dev enp3s0 proto dhcp src 192.168.0.102 metric 1024 
10.64.0.1 dev wg-mullvad proto static 
192.168.0.0/24 dev enp3s0 proto kernel scope link src 192.168.0.102 metric 1024 
192.168.0.1 dev enp3s0 proto dhcp scope link src 192.168.0.102 metric 1024

yaomtc avatar Aug 10 '21 04:08 yaomtc

As of just now, I cannot reproduce this with NetworkManager and 2021.4. What's the output of ip route get 193.138.218.78 and what's the output of mullvad status too. Again, it would also be nice if you were to send a problem report, or have a look at /var/log/mullvad-vpn/daemon.log yourself and see if there's anything interesting there.

I wonder if you've enabled auto-connect or always-require-vpn.

pinkisemils avatar Aug 10 '21 10:08 pinkisemils

If I can't connect to the Internet while the issue is occurring, how would it send a problem report? Would it include info from before the current session?

Auto-connect: Yes Always require VPN: No Tunnel protocol: WireGuard

Before issue:

$ ip route get 193.138.218.78
193.138.218.78 dev wg-mullvad table 1836018789 src 10.108.245.105 uid 1000 
    cache 
$ mullvad status
Tunnel status: Connected to WireGuard 89.46.62.145:10264 over UDP

During issue:

$ ip route get 193.138.218.78a 192.168.0.102 uid 1000 
    cache 
$ mullvad status
Tunnel status: Blocked: This device is offline, no tunnels can be established

From daemon.old.log:

[2021-08-10 18:20:59.702][talpid_core::dns][INFO] Resetting DNS
[2021-08-10 18:20:59.703][talpid_core::routing::imp::imp][DEBUG] Clearing routes
[2021-08-10 18:20:59.706][mullvad_daemon][DEBUG] New tunnel state: Disconnecting(Block)
[2021-08-10 18:20:59.936][talpid_core::tunnel_state_machine::connecting_state][DEBUG] Tunnel monitor exited with block reason: None
[2021-08-10 18:20:59.937][talpid_core::firewall][INFO] Applying firewall policy: Blocked. Allowing LAN. Allowing endpoint 193.138.218.78:443 over TCP
[2021-08-10 18:20:59.938][mullvad_daemon][DEBUG] New tunnel state: Error(ErrorState { cause: IsOffline, block_failure: None })
[2021-08-10 18:20:59.938][mullvad_daemon][INFO] Blocking all network connections, reason: This device is offline, no tunnels can be established
[2021-08-10 18:21:13.123][mullvad_daemon::version_check][DEBUG] Writing version check cache to /var/cache/mullvad-vpn/version-info.json
[2021-08-10 18:21:13.123][mullvad_daemon::management_interface][DEBUG] Broadcasting new app version info
[2021-08-10 18:21:34.098][mullvad_daemon::management_interface][DEBUG] get_tunnel_state
[2021-08-10 18:24:25.391][mullvad_daemon::relays][DEBUG] Selecting among 54 relays with combined weight 50501
[2021-08-10 18:24:25.391][mullvad_daemon::relays][INFO] Selected relay us108-wireguard at 86.106.121.249
[2021-08-10 18:24:25.391][mullvad_daemon::relays][DEBUG] Relay matched on highest preference for retry attempt 0
[2021-08-10 18:24:25.393][talpid_core::firewall][INFO] Applying firewall policy: Connecting to 86.106.121.249:41882 over UDP with gateways 10.64.0.1,fc00:bbbb:bbbb:bb01::1, Allowing LAN, interface: none
[2021-08-10 18:24:25.403][talpid_core::tunnel::wireguard][DEBUG] Using kernel WireGuard implementation
[2021-08-10 18:24:25.403][mullvad_daemon][DEBUG] New tunnel state: Connecting { endpoint: TunnelEndpoint { endpoint: Endpoint { address: 86.106.121.249:41882, protocol: Udp }, tunnel_type: Wireguard, proxy: None, entry_endpoint: None }, location: Some(GeoIpLocation { ipv4: None, ipv6: None, country: "USA", city: Some("New York, NY"), latitude: 40.73061, longitude: -73.935242, mullvad_exit_ip: true, hostname: Some("us108-wireguard"), bridge_hostname: None }) }
[2021-08-10 18:24:25.404][talpid_core::firewall][INFO] Applying firewall policy: Connecting to 86.106.121.249:41882 over UDP with gateways 10.64.0.1,fc00:bbbb:bbbb:bb01::1, Allowing LAN, interface: wg-mullvad
[2021-08-10 18:24:25.408][talpid_core::routing::imp::imp][DEBUG] Clearing routes
[2021-08-10 18:24:25.408][mullvad_daemon][DEBUG] New tunnel state: Disconnecting(Block)
[2021-08-10 18:24:25.408][talpid_core::routing::imp::imp][DEBUG] Adding routes: {RequiredRoute { prefix: V4(Ipv4Network { addr: 10.64.0.1, prefix: 32 }), node: RealNode(Node { ip: None, device: Some("wg-mullvad") }), table_id: 254 }, RequiredRoute { prefix: V4(Ipv4Network { addr: 0.0.0.0, prefix: 0 }), node: RealNode(Node { ip: None, device: Some("wg-mullvad") }), table_id: 1836018789 }}
[2021-08-10 18:24:25.605][talpid_core::tunnel_state_machine::connecting_state][DEBUG] Tunnel monitor exited with block reason: None
[2021-08-10 18:24:25.605][talpid_core::tunnel::wireguard][WARN] Timeout while checking tunnel connection
[2021-08-10 18:24:26.404][talpid_core::firewall][INFO] Applying firewall policy: Blocked. Allowing LAN. Allowing endpoint 193.138.218.78:443 over TCP
[2021-08-10 18:24:26.405][mullvad_daemon][DEBUG] New tunnel state: Error(ErrorState { cause: IsOffline, block_failure: None })
[2021-08-10 18:24:26.405][mullvad_daemon][INFO] Blocking all network connections, reason: This device is offline, no tunnels can be established

yaomtc avatar Aug 10 '21 22:08 yaomtc

If I can't connect to the Internet while the issue is occurring, how would it send a problem report?

You can separate collecting the report and sending it into two steps with the problem report CLI. That's the underlying tool that the GUI problem report use anyway, so they are identical really.

$ mullvad-problem-report collect --output report.log
$ mullvad-problem-report send --report report.log --email [email protected] --mesage "No internet connection after resume from sleep on Arch. Please forward to Emils //yaomtc"

Or skip the second step and upload the report in this issue.

faern avatar Aug 11 '21 08:08 faern

I believe this is a duplicate of #2744. This should be fixed in master. Using NetworkManager should also help due to the way it orders it's operatons, I think. You could also try adding a link as a hacky workaround. The bug is that the daemon is only listening for changes in links and their addresses and not changes for the routing table to deduce if it has connectivity. This is fixed in master, but we've not released it yet.

pinkisemils avatar Aug 12 '21 11:08 pinkisemils

Thanks, that seems like a safe assumption. I see nobody has yet made a -git version in the AUR. If I ever feel motivated to learn how to make a PKGBUILD maybe I'll change that... Or I'll just wait for the next release. I don't use suspend very often.

yaomtc avatar Aug 12 '21 17:08 yaomtc

@pinkisemils When will a bugfix for this issue be rolled out? How can this be avoided in the meantime without having to kill the daemon every time? You said:

You could also try adding a link as a hacky workaround.

What exactly do you mean by that?

Thanks

Shadow505 avatar Aug 17 '21 12:08 Shadow505

We currently have no ETA for the next stable release. But expect a couple of weeks at least. This could potentially show up in a beta sooner, so look out for that if this bug is a problem for you.

faern avatar Aug 17 '21 13:08 faern

How can you perform the workaround that @pinkisemils has been mentioned above?

Shadow505 avatar Aug 17 '21 13:08 Shadow505

The daemon should trigger it's connectivity check whenever a link is added or removed, or when an IP address is set or removed from a link. So, you could do that by doing sudo ip link add type dummy and sudo ip link delete dummy0.

pinkisemils avatar Aug 17 '21 15:08 pinkisemils

Oh okay, I thought you mean something different. Hoping that the beta with the bugfix is available soon

Shadow505 avatar Aug 19 '21 14:08 Shadow505

I can confirm I am getting the exact same error on Debian 11. This is the message I get:

Tunnel status: Blocked: This device is offline, no tunnels can be established

Not sure what the problem is but I am unable to fix it. If I uninstall the app, then re-install it, it works fine. If I restart my system, the app gives me that error message even though I'm clearly connected to the internet. So coming here to confirm.

mknepper avatar Sep 02 '21 00:09 mknepper

Having the same issue here on macOS 11.5.2. If I disconnect from any wifi connection at all while the VPN is active, the VPN won't be able to reconnect as it's blocking the internet access to my device. Fastest workaround I can find is to quit app and reinstall it.

satrinity402 avatar Sep 07 '21 16:09 satrinity402

@satrinity402 This is an issue that's exclusively for Linux, the macOS issue is referenced in a different one. And a workaround for that is here. It's an unfortunate bug, but we're working on this.

pinkisemils avatar Sep 08 '21 09:09 pinkisemils

I just wanted to chime in that I don't have this issue with 2021.4, but I do in 2021.5, running Pop!_OS 21.04.

davidskeck avatar Nov 10 '21 21:11 davidskeck

@davidskeck Do you still experience the same behavior with 2021.6-beta1? Could you please provide the output of ip route and ip route -6 after the machine ends up in this state? And do send a problem report too, reading the logs always helps.

pinkisemils avatar Nov 12 '21 17:11 pinkisemils

@davidskeck Do you still experience the same behavior with 2021.6-beta1? Could you please provide the output of ip route and ip route -6 after the machine ends up in this state? And do send a problem report too, reading the logs always helps.

the issue for me has been solved with the latest beta version. Great job.

ghost avatar Nov 13 '21 16:11 ghost

This issue is fixed for me in the latest version

davidskeck avatar Jan 25 '22 18:01 davidskeck

I am experiencing the same issue with 2022.3-beta2 on 1 of 3 Win11 machines, which I have to reboot as it just won't connect after waking up. The other 2 machines work as expected.

edit: I just learned to read and noticed the original issue is for linux, let me know if you want me to create a new issue.

hugalafutro avatar Jul 05 '22 14:07 hugalafutro

@hugalafutro Thanks, we're aware of that issue. There probably won't be another release for at least 1.5 weeks, so I suggest downgrading to the stable version until then.

dlon avatar Jul 05 '22 15:07 dlon

I still get this issue on the latest beta 2023.4-beta1

inboundbark avatar Jun 13 '23 10:06 inboundbark

What distribution are you using @inboundbark ? What's your network configuation like? is it a single ethernet/wireless adapter with a DHCP client or something more interesting?

pinkisemils avatar Jun 27 '23 12:06 pinkisemils

@pinkisemils I'm on Arch Linux and I'm using NetworkManager over a wireless connection with default configs. What's strange is that this issue only happens on my laptop, whereas I never get this issue after I restart or wake my arch desktop (also using NetworkManager). AFAIK the only difference is that my desktop is on ethernet so maybe that's part of the problem.

inboundbark avatar Jun 27 '23 13:06 inboundbark

Are you using a particular kernel? I cannot reproduce this on any of my machines on Linux, and since we're using netlink for offline detection and listen for changes in the routing table, and for every change the daemon checks if a default route exists. When the machine wakes up, does it have a default route? This can be deduced by running ip route get default.

It might be helpful to see what kinds of messages we'd receive from the routing socket, so you could run ip monitor before the laptop goes to sleep and see the output afterwards. It will contain IP addresses in your local network, so feel free to censor those, but it would be helpful for us to see both the output of ip monitor and our daemon logs side by side to see what's going on.

pinkisemils avatar Jun 27 '23 13:06 pinkisemils

All these are recalls from my memory so it may be wrong! I'm a hobbyist and I lack related knowledge so I do not guarantee what I say is correct. I did not do testing to make sure what I say is correct.

I was using Arch Linux, NetworkManager, mullavd, disabled systemd-resolvd, enabled mullvad kill switch and mullvad lockdown mode when I met this issue.

I met this issue several times before, I can't recall correctly how I fixed it or the reason. But I think it is related to /etc/resolv.conf and all related DNS resolve craziness (if you ever have DNS issue and try to troubleshoot, you know what I'm saying). Arch Linux iso default does not make /etc/resolv.conf a symlink that symlinked to /run/systemd/resolve/stub-resolv.conf. Therefore /etc/resolv.conf itself is a file that managed/written by my NetworkManager and mullvad. And somehow after I wake up from suspend, sometimes /etc/resolv.conf is not written correctly by my NetworkManager and mullvad (if I recall correctly). I guess it is due to NetworkManager and mullvad both try to override /etc/resolv.conf and there's conflicts.

To fix it, you need to fix /etc/resolv.conf somehow. My first recommended solution to fix the issue is sudo ln -sf /run/systemd/resolve/stub-resolv.conf /etc/resolv.conf, which is also recommended in man systemd-resolved.service and is also Arch Linux cloud-init cloud image default and recommended in Arch wiki https://wiki.archlinux.org/title/Systemd-resolved#DNS, don't forget to enable systemd-resolved service, and whatever you need to do to reboot or restart or daemon-reload to make sure those systemd services actually use current configuration. My second recommended solution (which is what I currently using) is to use dnsmasq and openresolv instead of systemd-resolved, but this way you need to make a lot more configurations. There are more ways to fix the issue of /etc/resolv.conf, I'm only talking about two of them here.

If you choose to go with my second recommendation, you can use some of my configurations as reference, note some of my configurations are to fix another weird issue I met (which can't be fixed by my first recommendation) so it maybe not necessary:

Also for the second recommendation, don't forget to enable dnsmasq service, disable systemd-resolved service, you also need to do sudo resolvconf -u, and whatever you need to do to reboot or restart or daemon-reload to make sure those systemd services actually use current configuration. Honestly, I don't understand all these configurations, they just somehow worked.

flyxyz123 avatar Jun 27 '23 19:06 flyxyz123