sing-box icon indicating copy to clipboard operation
sing-box copied to clipboard

[Android] WireGuard does not work properly after network outage and recovery

Open lisongmin opened this issue 1 year ago • 15 comments

Operating system

Android

System version

lineageos 20

Installation type

sing-box for Android Graphical Client

If you are using a graphical client, please provide the version of the client.

1.8.4

Version

No response

Description

The WireGuard works properly on startup. However, after disconnecting and reconnecting the network, it cannot handshake with the server.

Reproduction

  1. Starting the sing-box
  2. Check we can access the WireGuard network by curl http://192.168.2.1, and it works.
  3. Turn off the network, and then turn it back on.
  4. Wait a while, and try to access http://192.168.2.1 again, it does not work.

I dump the traffic on the server(see logs at the end), It seems that server can receive data from sing-box and send data to sing-box.

configuration

{
  "log": { "level": "debug" },
  "dns": {
    "servers": [
      {
        "tag": "home-dns",
        "address": "udp://192.168.6.1",
        "detour": "direct",
        "strategy": "ipv4_only"
      },
      {
        "tag": "wg-dns",
        "address": "udp://192.168.2.6",
        "detour": "go-home",
        "strategy": "ipv4_only"
      },
      {
        "tag": "default-dns",
        "strategy": "ipv4_only",
        "address": "h3://223.5.5.5/dns-query",
        "detour": "direct"
      }
    ],
    "rules": [
      {
        "domain_suffix": [".home.example.com"],
        "wifi_ssid": ["home-dns"],
        "server": "family"
      },
      {
        "domain_suffix": [".home.example.com"],
        "server": "wg-dns"
      }
    ],
    "final": "default-dns"
  },
  "inbounds": [
    {
      "type": "tun",
      "tag": "tun-in",
      "interface_name": "tun0",
      "inet4_address": "172.19.0.1/30",
      "inet6_address": "fdfe:2204:cfab::1/126",
      "mtu": 9000,
      "auto_route": true,
      "strict_route": true,
      "inet4_route_address": ["0.0.0.0/1", "128.0.0.0/1"],
      "inet6_route_address": ["::/1", "8000::/1"],
      "endpoint_independent_nat": false,
      "stack": "system",
      "sniff": true
    }
  ],
  "outbounds": [
    { "type": "direct", "tag": "direct" },
    { "type": "block", "tag": "block" },
    { "type": "dns", "tag": "dns" },
    {
      "type": "wireguard",
      "tag": "go-home",
      "local_address": ["10.249.0.3/32"],
      "private_key": "KNx4llKEZwqB5Q69MMVlFfj+7pVaRIFiw63tkSvblmA=",
      "peers": [
        {
          "server": "home.example.com",
          "server_port": 51802,
          "public_key": "DBjU7sR7/Qx65b6m4IKTAZrjDHBeWsruMyoSpV1ES1U=",
          "allowed_ips": ["192.168.2.0/24", "10.249.0.0/24"]
        }
      ]
    }
  ],
  "route": {
    "final": "direct",
    "auto_detect_interface": true,
    "rules": [
      { "protocol": "dns", "outbound": "dns" },
      {
        "wifi_ssid": ["abc"],
        "ip_cidr": ["192.168.2.0/24", "10.249.0.0/24"],
        "outbound": "direct"
      },
      {
        "ip_cidr": ["192.168.2.0/24", "10.249.0.0/24"],
        "outbound": "go-home"
      }
    ]
  }
}

sing-box logs

sfa.log

tcpdump on server

before network disconnect

21:37:53.245737 pppoe-wan In  IP 180.139.224.173.24192 > 124.227.226.83.51802: UDP, length 96
21:37:53.248727 pppoe-wan In  IP 180.139.224.173.24192 > 124.227.226.83.51802: UDP, length 96
21:37:53.249471 pppoe-wan Out IP 124.227.226.83.51802 > 180.139.224.173.24192: UDP, length 96
21:37:53.279670 pppoe-wan In  IP 180.139.224.173.24192 > 124.227.226.83.51802: UDP, length 96
21:38:03.482498 pppoe-wan Out IP 124.227.226.83.51802 > 180.139.224.173.24192: UDP, length 32
21:38:03.505715 pppoe-wan In  IP 180.139.224.173 > 124.227.226.83: ICMP 180.139.224.173 udp port 24192 unreachable, length 68

After network recovery

21:38:04.607864 pppoe-wan In  IP 180.139.224.173.24206 > 124.227.226.83.51802: UDP, length 96
21:38:04.608708 pppoe-wan Out IP 124.227.226.83.51802 > 180.139.224.173.24206: UDP, length 96
21:38:05.511612 pppoe-wan In  IP 180.139.224.173.24206 > 124.227.226.83.51802: UDP, length 96
21:38:05.512196 pppoe-wan Out IP 124.227.226.83.51802 > 180.139.224.173.24206: UDP, length 80
21:38:05.581322 pppoe-wan In  IP 180.139.224.173.24206 > 124.227.226.83.51802: UDP, length 96
21:38:05.581891 pppoe-wan Out IP 124.227.226.83.51802 > 180.139.224.173.24206: UDP, length 96
21:38:06.602565 pppoe-wan Out IP 124.227.226.83.51802 > 180.139.224.173.24206: UDP, length 96

Logs

No response

Integrity requirements

  • [X] I confirm that I have read the documentation, understand the meaning of all the configuration items I wrote, and did not pile up seemingly useful options or default values.
  • [X] I confirm that I have provided the server and client configuration files and process that can be reproduced locally, instead of a complicated client configuration file that has been stripped of sensitive data.
  • [X] I confirm that I have provided the simplest configuration that can be used to reproduce the error I reported, instead of depending on remote servers, TUN, graphical interface clients, or other closed-source software.
  • [X] I confirm that I have provided the complete configuration files and logs, rather than just providing parts I think are useful out of confidence in my own intelligence.

lisongmin avatar Jan 28 '24 14:01 lisongmin

Ran into similar issue on Linux with "auto_detect_interface": true.

Everything works fine before the interfaces' changing:

DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - received handshake response
DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - sending keepalive packet
INFO router: updated default interface eth0, index 2
DEBUG outbound/wireguard[warp]: routine: receive incoming receive - stopped
DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - retrying handshake because we stopped hearing back after 15 seconds
DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - sending handshake initiation
DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - handshake did not complete after 5 seconds, retrying (try 2)
DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - handshake did not complete after 5 seconds, retrying (try 3)
DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - handshake did not complete after 5 seconds, retrying (try 4)

All the handshakes fail and never succeed again.

I guess it's a bug. I'll try to find a minimal reproduce.

hellodword avatar Feb 28 '24 06:02 hellodword

With small patch:

diff --git a/outbound/wireguard.go b/outbound/wireguard.go
index 045241f..c08c6b8 100644
--- a/outbound/wireguard.go
+++ b/outbound/wireguard.go
@@ -165,7 +165,10 @@ func (w *WireGuard) Close() error {
 }
 
 func (w *WireGuard) InterfaceUpdated() {
-	w.device.BindUpdate()
+	err := w.device.BindUpdate()
+	if err != nil {
+		w.logger.Error("InterfaceUpdated ", err)
+	}
 	return
 }
INFO router: updated default interface eth0, index 2
DEBUG outbound/wireguard[warp]: routine: receive incoming receive - stopped
ERROR outbound/wireguard[warp]: InterfaceUpdated use of closed network connection
INFO router: updated default interface wlp2s0, index 3
ERROR outbound/wireguard[warp]: InterfaceUpdated use of closed network connection
INFO router: updated default interface eth0, index 2
ERROR outbound/wireguard[warp]: InterfaceUpdated use of closed network connection

hellodword avatar Feb 28 '24 07:02 hellodword

Try https://github.com/SagerNet/sing-box/commit/dd52c26ae1bd6751b99d75d315048d71c592f033

nekohasekai avatar Feb 28 '24 07:02 nekohasekai

https://github.com/SagerNet/sing-box/commit/dd52c26ae1bd6751b99d75d315048d71c592f033 with v1.8.6 got the same errors, but, my bad, I didn't mention that I'm using detour and system_interface with wireguard outbound:

{
      "detour": "auto:proxy",
      "interface_name": "warp",
      "system_interface": true,
      "tag": "warp",
      "type": "wireguard"
      ...
}

I'm trying to give a minimal reproduce.

hellodword avatar Feb 28 '24 07:02 hellodword

{
  "inbounds": [
    {
      "listen": "0.0.0.0",
      "listen_port": 1080,
      "type": "mixed"
    }
  ],
  "log": {
    "disabled": false,
    "level": "trace",
    "timestamp": true
  },
  "outbounds": [
    {
      "tag": "warp",
      "detour": "proxy",
      "system_interface": false,
      "type": "wireguard",
      ???
    },
    {
      "tag": "proxy",
      "type": "vmess",
      ???
    }
  ],
  "route": {
    "auto_detect_interface": true,
    "final": "warp"
  }
}

I have two network interfaces eth0 and wlp2s0, I can reproduce the errors with making the eth0 plugged and unplugged.

hellodword avatar Feb 28 '24 08:02 hellodword

Similar problem. When I enable Wireguard in Sing-Box on my Android phone outside of my home via mobile internet, it works fine. When I come home and my phone connects to my home Wi-Fi, the Internet on my phone disappears, and in order to get it back I have to shut down the Sing-Box. Sing-Box version 1.8.8 and Android 14. Upd. I checked on 1.9.0-beta.8 - the same problem exists.

Dr4tez avatar Mar 07 '24 20:03 Dr4tez

I think these are caused by incorrect/stale bound/connected UDP socket.

Currently WireGuard transport creates and connects the underlying UDP socket on start, and uses the same UDP socket for subsequent send/recv. When connected, this UDP socket will bind to a local IP and port.

After network change/recovery, the host's IP address will change, and this UDP socket's local IP address is no long available. The socket API doesn't give any error for UDP on this socket, so it will seem sending successfully (althouth the packet may or may not arrive at the destation) and will receive nothing afterward.

This undetected dead UDP socket also cause problems for IPv6. Some ISP will change your prefix periodically, the host's IPv6 address will change and kill the previously bound UDP socket. And during startup, when the IPv6 address is in tentative state, the connect will succeed but bind to a link local IPv6 address, which also leave a dead socket.

I think the above conditions can be simulated by manually delete/change host's ( bound UDP socket's ) local IP address and tested using docker/netcat.

If we can't easily detect this, maybe we can just recreate/reconnect the UDP socket if haven't received anything for a specific duration.

jwfang avatar Mar 09 '24 06:03 jwfang

After network changes/restoration, as well as when using the Clash API to disconnect all connections, the same situation occurs where the WireGuard connection fails to automatically restore.

My WireGuard configuration with an upstream, deployed on a side Linux device (LXC container in Proxmox ).

pierre-primary avatar Mar 18 '24 09:03 pierre-primary

I had the same issue on Android 14 with Sing-Box 1.8.9. While setting "gso":true in the Wireguard outbound configuration fixed the connection drop after switching networks, it now takes about 30 seconds to come back online.

BehradJi avatar Mar 18 '24 14:03 BehradJi

I had the same issue on Android 14 with Sing-Box 1.8.9. While setting "gso":true in the Wireguard outbound configuration fixed the connection drop after switching networks, it now takes about 30 seconds to come back online.

Thanks for the tip, it worked for me! There are no more wireguard connection drops when moving from one network to another. In any case, it is not noticeable at all, not 30 seconds, not even one second. Android 14, arm64-v8a and 1.9.0-beta.16.

Dr4tez avatar Mar 18 '24 17:03 Dr4tez

Try f61b272cbf3732ac7d8307ee787963ba78ca5945

nekohasekai avatar Mar 20 '24 02:03 nekohasekai

https://github.com/SagerNet/sing-box/commit/f61b272cbf3732ac7d8307ee787963ba78ca5945 works for me, with 1.8.9

03:24:39 INFO router: updated default interface wlp2s0, index 3
03:24:39 DEBUG outbound/wireguard[warp]: routine: receive incoming receive - stopped
03:24:39 DEBUG outbound/wireguard[warp]: udp bind has been updated
03:24:39 DEBUG outbound/wireguard[warp]: routine: receive incoming receive - started
03:24:39 INFO outbound/vmess[proxy-1]: outbound packet connection to 162.159.192.1:2408
03:25:03 DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - sending handshake initiation
03:25:03 DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - received handshake response
03:25:03 DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - sending keepalive packet
03:25:20 DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - retrying handshake because we stopped hearing back after 15 seconds
03:25:20 DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - sending handshake initiation
03:25:20 DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - received handshake response
03:25:20 DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - sending keepalive packet
03:25:21 INFO router: updated default interface eth0, index 2
03:25:21 DEBUG outbound/wireguard[warp]: routine: receive incoming receive - stopped
03:25:21 DEBUG outbound/wireguard[warp]: udp bind has been updated
03:25:21 DEBUG outbound/wireguard[warp]: routine: receive incoming receive - started
03:25:21 INFO outbound/vmess[proxy-1]: outbound packet connection to 162.159.192.1:2408
03:25:36 DEBUG outbound/wireguard[warp]: peer(bmXO…fgyo) - sending keepalive packet
03:25:37 INFO router: updated default interface wlp2s0, index 3
03:25:37 DEBUG outbound/wireguard[warp]: routine: receive incoming receive - stopped
03:25:37 DEBUG outbound/wireguard[warp]: udp bind has been updated
03:25:37 DEBUG outbound/wireguard[warp]: routine: receive incoming receive - started
03:25:37 INFO outbound/vmess[proxy-1]: outbound packet connection to 162.159.192.1:2408

hellodword avatar Mar 20 '24 03:03 hellodword

On versions 1.8.10 - 1.8.14 and 1.9.0-rc.1 - 1.9.0-rc.22 the application interface stops responding to actions with it after switching from wifi to mobile Internet if the configuration has active wireguard outbounds without "gso": true. Android 14, arm64-v8a.

Dr4tez avatar Mar 23 '24 18:03 Dr4tez

Thanks for those tips man; "gso": true really does work for me! Gosh I've had this problem with sing-box forever ago and always wondered if it was just me

Dondrejohnson5 avatar Apr 12 '24 19:04 Dondrejohnson5