tailscale icon indicating copy to clipboard operation
tailscale copied to clipboard

Tailscale is slow: `wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument`

Open marek22k opened this issue 1 year ago • 33 comments

What is the issue?

Hello,

I have a problem. For a few days now, Tailscale has been extremely slow and the following message keeps appearing in the log:

Aug 05 23:40:47 ciel tailscaled[876]: wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
Aug 05 23:40:47 ciel tailscaled[876]: write /dev/net/tun: invalid argument
Aug 05 23:40:47 ciel tailscaled[876]: write /dev/net/tun: invalid argument
Aug 05 23:40:47 ciel tailscaled[876]: write /dev/net/tun: invalid argument
Aug 05 23:40:47 ciel tailscaled[876]: wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
Aug 05 23:40:47 ciel tailscaled[876]: wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
Aug 05 23:40:47 ciel tailscaled[876]: wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
Aug 05 23:40:47 ciel tailscaled[876]: wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
Aug 05 23:40:47 ciel tailscaled[876]: [RATELIMIT] format("wg: Failed to write packets to TUN device: %v")
Aug 05 23:46:09 ciel tailscaled[876]: [RATELIMIT] format("wg: Failed to write packets to TUN device: %v") (30 dropped)
Aug 05 23:46:09 ciel tailscaled[876]: wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
Aug 05 23:46:09 ciel tailscaled[876]: write /dev/net/tun: invalid argument
Aug 05 23:46:09 ciel tailscaled[876]: write /dev/net/tun: invalid argument
Aug 05 23:46:09 ciel tailscaled[876]: wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
Aug 05 23:46:09 ciel tailscaled[876]: wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
Aug 05 23:46:09 ciel tailscaled[876]: wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
Aug 05 23:46:09 ciel tailscaled[876]: wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
Aug 05 23:46:09 ciel tailscaled[876]: [RATELIMIT] format("wg: Failed to write packets to TUN device: %v")

By slow I mean the following behavior (iperf3 is running on my tailscale exit node):

$ iperf3 --client vpn-lu.uaru-vimba.ts.net -R
Connecting to host vpn-lu.uaru-vimba.ts.net, port 5201
Reverse mode, remote host vpn-lu.uaru-vimba.ts.net is sending
[  5] local 100.97.92.115 port 44816 connected to 100.110.144.38 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   2.00-3.00   sec   128 KBytes  1.05 Mbits/sec                  
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   4.00-5.00   sec   128 KBytes  1.05 Mbits/sec                  
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   7.00-8.00   sec   128 KBytes  1.05 Mbits/sec                  
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   9.00-10.00  sec   128 KBytes  1.05 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.93  sec   609 KBytes   456 Kbits/sec  282             sender
[  5]   0.00-10.00  sec   512 KBytes   419 Kbits/sec                  receiver

iperf Done.

and for reference the other way around is works:

$ iperf3 --client vpn-lu.uaru-vimba.ts.net
Connecting to host vpn-lu.uaru-vimba.ts.net, port 5201
[  5] local 100.97.92.115 port 39542 connected to 100.110.144.38 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.50 MBytes  21.0 Mbits/sec    1   71.4 KBytes       
[  5]   1.00-2.00   sec  3.62 MBytes  30.4 Mbits/sec    0   99.3 KBytes       
[  5]   2.00-3.00   sec  3.88 MBytes  32.5 Mbits/sec    1   87.2 KBytes       
[  5]   3.00-4.00   sec  3.75 MBytes  31.5 Mbits/sec    1    102 KBytes       
[  5]   4.00-5.00   sec  3.88 MBytes  32.5 Mbits/sec    0    104 KBytes       
[  5]   5.00-6.00   sec  3.75 MBytes  31.5 Mbits/sec    1   93.2 KBytes       
[  5]   6.00-7.00   sec  4.00 MBytes  33.6 Mbits/sec    1   83.6 KBytes       
[  5]   7.00-8.00   sec  3.88 MBytes  32.5 Mbits/sec    0    111 KBytes       
[  5]   8.00-9.00   sec  3.62 MBytes  30.4 Mbits/sec    1   96.9 KBytes       
[  5]   9.00-10.00  sec  3.88 MBytes  32.5 Mbits/sec    1   84.8 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  36.8 MBytes  30.8 Mbits/sec    7             sender
[  5]   0.00-10.93  sec  36.5 MBytes  28.0 Mbits/sec                  receiver

iperf Done.

Steps to reproduce

No idea. I undid all the changes I made to tailscale (changing the systemd unit and disabling the support logs) for testing. It did not help.

OS

Arch Linux

OS version

$ uname -a
Linux ciel 6.6.44-2-lts #1 SMP PREEMPT_DYNAMIC Sat, 03 Aug 2024 18:10:33 +0000 x86_64 GNU/Linux

Tailscale version

$ tailscale version
1.70.0
  tailscale commit: 0e0a212418fbf8243cb3f06634367b61e81ea9db-dirty
  go version: go1.22.5

Bug report

$ tailscale bugreport
BUG-c75d50b30c42938e61b3d6dfd8683ea7f95df6373200ac4dafdee5700dbabca1-20240805234440Z-89d5c5a39dee5ae1

Additional information

Installed via pacman (tailscale package).

I also run a few other things like cjdns, Yggdrasil, libvirt and docker (-> tailscale is not the only tun interface) on the computer:

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: enp1s0f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 74:5d:22:f4:8d:6b brd ff:ff:ff:ff:ff:ff
3: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7e:1b:d9:a1:a2:d7 brd ff:ff:ff:ff:ff:ff permaddr 04:7b:cb:c7:41:10
    inet 192.168.41.[redacted]/24 brd 192.168.41.255 scope global dynamic noprefixroute wlp2s0
       valid_lft 42776sec preferred_lft 42776sec
    inet6 2a02:3100:[redacted]/64 scope global dynamic noprefixroute 
       valid_lft 232740sec preferred_lft 146340sec
    inet6 fd5f:28a6:[redacted]/64 scope global noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 fe80::[redacted]/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1304 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none 
    inet6 fcf6:773c:[redacted]/8 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::8f97:eaaa:dd2:ee16/64 scope link stable-privacy proto kernel_ll 
       valid_lft forever preferred_lft forever
5: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 52:54:00:b2:fa:7f brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.1/24 brd 192.168.100.255 scope global virbr0
       valid_lft forever preferred_lft forever
    inet6 fd6c:ffa:eb7a:74e2::1/64 scope global tentative 
       valid_lft forever preferred_lft forever
6: tailscale0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1280 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none 
    inet 100.97.92.115/32 scope global tailscale0
       valid_lft forever preferred_lft forever
    inet6 fd7a:115c:a1e0::8e21:5c73/128 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::7986:2c09:c665:1b26/64 scope link stable-privacy proto kernel_ll 
       valid_lft forever preferred_lft forever
7: tun1: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 65535 qdisc fq_codel state UNKNOWN group default qlen 500
    link/none 
    inet6 200:[redacted]/7 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::b56c:e008:2405:5b50/64 scope link stable-privacy proto kernel_ll 
       valid_lft forever preferred_lft forever
9: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:70:f9:2d:b1 brd ff:ff:ff:ff:ff:ff
    inet 192.168.128.1/24 brd 192.168.128.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 dd64:6f63:6b65:72::1/80 scope global tentative 
       valid_lft forever preferred_lft forever

strace (when doing iperf3):

[pid 15854] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\3@\0@\6\260\213dn\220&da\\s\24Q"..., 1290) = 1290
[pid 15858] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\4@\0@\6\260\212dn\220&da\\s\24Q"..., 1290) = 1290
[pid 15858] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\5@\0@\6\260\211dn\220&da\\s\24Q"..., 1290) = 1290
[pid 15858] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\6@\0@\6\260\210dn\220&da\\s\24Q"..., 1290) = 1290
[pid 15826] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\7@\0@\6\260\207dn\220&da\\s\24Q"..., 1290) = 1290
[pid 15864] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\10@\0@\6\260\206dn\220&da\\s\24Q"..., 1290) = 1290
[pid 15826] write(11, "\1\1(\0\330\4\24\0\20\0E\0\t\330\320\t@\0@\6\253\255dn\220&da\\s\24Q"..., 2530) = -1 EINVAL (Invalid argument)
[pid 15826] write(11, "\1\1(\0\330\4\24\0\20\0E\0\t\330\320\v@\0@\6\253\253dn\220&da\\s\24Q"..., 2530) = -1 EINVAL (Invalid argument)
[pid 15826] write(11, "\1\1(\0\330\4\24\0\20\0E\0\t\330\320\r@\0@\6\253\251dn\220&da\\s\24Q"..., 2530) = -1 EINVAL (Invalid argument)
[pid 15826] write(11, "\1\1(\0\330\4\24\0\20\0E\0\t\330\320\17@\0@\6\253\247dn\220&da\\s\24Q"..., 2530) = -1 EINVAL (Invalid argument)
[pid 15864] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\21@\0@\6\260}dn\220&da\\s\24Q"..., 1290) = 1290
[pid 15858] write(11, "\1\1(\0\330\4\24\0\20\0E\0\t\330\320\22@\0@\6\253\244dn\220&da\\s\24Q"..., 2530) = -1 EINVAL (Invalid argument)
[pid 15864] write(11, "\1\1(\0\330\4\24\0\20\0E\0\t\330\320\24@\0@\6\253\242dn\220&da\\s\24Q"..., 2530) = -1 EINVAL (Invalid argument)
[pid 15864] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\26@\0@\6\260xdn\220&da\\s\24Q"..., 1290) = 1290
[pid 15858] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\27@\0@\6\260wdn\220&da\\s\24Q"..., 1290) = 1290
[pid 15858] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\30@\0@\6\260vdn\220&da\\s\24Q"..., 1290) = 1290
[pid 15826] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\31@\0@\6\260udn\220&da\\s\24Q"..., 1290) = 1290
[pid 15854] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\32@\0@\6\260tdn\220&da\\s\24Q"..., 1290) = 1290
[pid 15854] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\33@\0@\6\260sdn\220&da\\s\24Q"..., 1290) = 1290

If needed, I can also send the output of sudo strace -f -o strace.log tailscaled -verbose 1 > tailscale.log 2>&1 privately.

Another observation that might be interesting: I have a second laptop with pretty much the same setup (Arch Linux + Tailscale). I hadn't updated it for a few days / weeks, so kernel 6.6.35-2-lts was still installed. Tailscale worked there. I have now updated it to kernel 6.6.44-2-lts and since then the same problem occurs.

marek22k avatar Aug 05 '24 23:08 marek22k

Does this reproduce for you on linux-mainline?

cc @jwhited

raggi avatar Aug 06 '24 01:08 raggi

The problem is also under 6.10.3-arch1-1, whether it is also under linux-mainline I have to see (assuming my computer manages to download and build the AUR package).

marek22k avatar Aug 06 '24 02:08 marek22k

[pid 15864] write(11, "\0\0\0\0\0\0\0\0\0\0E\0\5\0\320\10@\0@\6\260\206dn\220&da\\s\24Q"..., 1290) = 1290
[pid 15826] write(11, "\1\1(\0\330\4\24\0\20\0E\0\t\330\320\t@\0@\6\253\255dn\220&da\\s\24Q"..., 2530) = -1 EINVAL (Invalid argument)

The writes towards the TUN appear to return EINVAL when length is greater than MSS/MTU, i.e. TCP or UDP GRO is involved. Going by the fact you're running TCP iperf3, I'm going to assume TCP.

The virtio_net_hdr values at the front look well-formed at first glance: needs_csum, GSO_TCPv4, hdr len 51 (this is a little unexpected), gso size 1060, csum start 32...

This is the first EINVAL report I'm aware of involving TCP GSO on the TUN. We've tested up to 6.8.5 and haven't seen this.

Can you provide a pcap of the first 128 bytes from each packet when you run iperf3? sudo tcpdump -s 128 -w /tmp/tailscale0.pcap -nnni tailscale0 tcp

Is there anything on the system that could be modifying the tailscale TUN config? Perhaps picking the wrong FD. If you stop the other software that creates a TUN, and restart tailscale, does the problem still occur?

jwhited avatar Aug 06 '24 03:08 jwhited

I also experienced this problem on Arch Linux with linux 6.10.3.arch1-1 and Tailscale version 1.70.0. Downgrading the kernel to 6.10.2-arch1-2 resolved the issue.

simon--taylor avatar Aug 06 '24 06:08 simon--taylor

The problem is also under 6.10.3-arch1-1, whether it is also under linux-mainline I have to see (assuming my computer manages to download and build the AUR package).

Unfortunately I can't compile the AUR package - even GitHub Actions can't do it: https://github.com/marek22k/linux-mainline-builder/actions/runs/10259112100/job/28383031908

The writes towards the TUN appear to return EINVAL when length is greater than MSS/MTU, i.e. TCP or UDP GRO is involved. Going by the fact you're running TCP iperf3, I'm going to assume TCP.

Yes. As far as I understand it, iperf3 uses TCP (in the default setting which I uses).

Can you provide a pcap of the first 128 bytes from each packet when you run iperf3? sudo tcpdump -s 128 -w /tmp/tailscale0.pcap -nnni tailscale0 tcp

tailscale0.pcap.txt (remove the .txt as GitHub doesn't support uploading .pcap files)

Is there anything on the system that could be modifying the tailscale TUN config? Perhaps picking the wrong FD. If you stop the other software that creates a TUN, and restart tailscale, does the problem still occur?

If I stop the other TUN programs, it still does not work. Apart from that, I have not (as far as I know) made any changes relevant to TUN adapters.

marek22k avatar Aug 06 '24 09:08 marek22k

I also experienced this problem on Arch Linux with linux 6.10.3.arch1-1 and Tailscale version 1.70.0. Downgrading the kernel to 6.10.2-arch1-2 resolved the issue.

Downgrading the kernel to 6.6.35-2-lts also solved the problem for me.

Edit: Doesn't work on 6.6.44-2-lts, 6.6.44-1-lts; works on 6.6.43-1-lts, 6.6.42-1-lts.

marek22k avatar Aug 06 '24 09:08 marek22k

I was able to partially reproduce this, specifically:

  • Using arch LTS Linux archlinux 6.6.44-2-lts #1 SMP PREEMPT_DYNAMIC Sat, 03 Aug 2024 18:10:33 +0000 x86_64 GNU/Linux
  • Observed iperf3 resulting in wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
  • Observed in dmesg: [ 96.578967] ens18: bad gso: type: 4, size: 1348

This appears to be scoped to IPv6.

I was not however able to reproduce this occurring persistently, in my test setup the issue certainly occurred, but it occurred for a short time and then throughput recovered, and the error stopped being logged, both on the tailscaled side and kernel side.

I was not able to reproduce on Linux archlinux 6.10.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 04 Aug 2024 05:11:32 +0000 x86_64 GNU/Linux.

We will need to dig into what patches have been ported into LTS, there have been some activities in virtio header handling which we're initially now concerned about, but upstream has already worked on improvements there, it's entirely possible LTS brought in a bug that needs to be addressed with further backports.

raggi avatar Aug 06 '24 18:08 raggi

there does not appear to be anything relevant in https://gitlab.archlinux.org/archlinux/packaging/packages/linux-lts recent history, or local patches

raggi avatar Aug 06 '24 20:08 raggi

After some digging, so far it looks likely that the culprit is https://github.com/gregkh/linux/commit/e269d79c7d35aa3808b1f3c1737d63dab504ddc8 and the fix is https://github.com/gregkh/linux/commit/89add40066f9ed9abe5f7f886fe5789ff7e0c50e

Next up, verify this.

raggi avatar Aug 06 '24 20:08 raggi

We have a potential mitigation for this issue in our wireguard-go which appears to avoid the issue in my lab environment, however as the issue was not triggering so consistently there, it would be great to receive further confirmation.

If you are able to do so, testing the patch linked just above this comment would be great, a command based approach would be:

go install tailscale.com/cmd/tailscaled@ab9d3474d400590c53863f6d69f6e2bd98fe2d90
sudo install ~/go/bin/tailscaled `which tailscaled`
sudo systemctl restart tailscaled

raggi avatar Aug 06 '24 21:08 raggi

* Observed in dmesg: `[   96.578967] ens18: bad gso: type: 4, size: 1348`

Mhh, I haven't seen anything like that in dmesg.

If you are able to do so, testing the patch linked just above this comment would be great, a command based approach would be:

go install tailscale.com/cmd/tailscaled@ab9d3474d400590c53863f6d69f6e2bd98fe2d90
sudo install ~/go/bin/tailscaled `which tailscaled`
sudo systemctl restart tailscaled

Since I just downgraded my LTS kernel and put it in hold, I tried the whole thing with the stable kernel and the error still occurs:

$ sudo ./go/bin/tailscaled -version
1.71.0-ERR-BuildInfo
  track: unstable (dev); frequent updates and bugs are likely
  go version: go1.22.5
$ sudo ./go/bin/tailscaled
[...]
magicsock: endpoints changed: [redacted]
magicsock: derp-4 connected; connGen=1
health(warnable=no-derp-home): ok
health(warnable=no-derp-connection): ok
wgengine: Reconfig: configuring userspace WireGuard config (with 2/21 peers)
magicsock: disco: node [fZx/a] d:31cd083bb280f5ef now using 104.244.77.202:41641 mtu=1360 tx=ebc92294aa39
magicsock: disco: node [fZx/a] d:31cd083bb280f5ef now using [2605:6400:30:f7dd::1]:41641 mtu=1360 tx=58063645f90b
wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
write /dev/net/tun: invalid argument
write /dev/net/tun: invalid argument
wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
write /dev/net/tun: invalid argument
wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
[RATELIMIT] format("wg: Failed to write packets to TUN device: %v")
open-conn-track: timeout opening (TCP 100.97.92.115:49032 => 95.216.195.133:80); no associated peer node
open-conn-track: timeout opening (TCP 100.97.92.115:49026 => 95.216.195.133:80); no associated peer node
open-conn-track: timeout opening (TCP 100.97.92.115:49040 => 95.216.195.133:80); no associated peer node
open-conn-track: timeout opening (TCP 100.97.92.115:49056 => 95.216.195.133:80); no associated peer node
magicsock: disco: node [JvgfA] d:8debf5a264ea0f54 now using [2a03:4000:6:2784::]:41641 mtu=1360 tx=a0e08dfee01b
open-conn-track: timeout opening (TCP 100.97.92.115:49062 => 95.216.195.133:80); no associated peer node
[RATELIMIT] format("open-conn-track: timeout opening %v; no associated peer node")
[RATELIMIT] format("wg: Failed to write packets to TUN device: %v") (87 dropped)
wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
wg: Failed to write packets to TUN device: write /dev/net/tun: invalid argument
[RATELIMIT] format("wg: Failed to write packets to TUN device: %v")
^Ctailscaled got signal interrupt; shutting down
canceling captive portal context
control: client.Shutdown ...
control: authRoutine: exiting
control: updateRoutine: exiting
control: mapRoutine: exiting
control: Client.Shutdown done.
magicsock: closing connection to derp-4 (conn-close), age 13s
magicsock: 0 active derp conns
[RATELIMIT] format("monitor: %s: src=%v, dst=%v, gw=%v, outif=%v, table=%v") (1 dropped)
monitor: RTM_DELROUTE: src=, dst=fd00::/8, gw=, outif=10, table=52
monitor: RTM_DELROUTE: src=, dst=fe80::/64, gw=, outif=10, table=254
[RATELIMIT] format("monitor: %s: src=%v, dst=%v, gw=%v, outif=%v, table=%v")
monitor: ip rule deleted: {Family:2 DstLength:0 SrcLength:0 Tos:0 Table:254 Protocol:0 Scope:0 Type:1 Flags:0 Attributes:{Dst:<nil> Src:<nil> Gateway:<nil> OutIface:0 Priority:5210 Table:254 Mark:16711680 Pref:<nil> Expires:<nil> Metrics:<nil> Multipath:[]}}
[...]
$ uname -a
Linux ciel 6.10.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 04 Aug 2024 05:11:32 +0000 x86_64 GNU/Linux

For me, the error also occurs always (after a few seconds) and not randomly.

Maybe I made a mistake and @simon--taylor can also test the patch?

marek22k avatar Aug 06 '24 22:08 marek22k

After some digging, so far it looks likely that the culprit is gregkh/linux@e269d79 and the fix is gregkh/linux@89add40

Next up, verify this.

Applying this patch to the kernel resolved the problem on my system.

simon--taylor avatar Aug 06 '24 22:08 simon--taylor

After some digging, so far it looks likely that the culprit is gregkh/linux@e269d79 and the fix is gregkh/linux@89add40 Next up, verify this.

Applying this patch to the kernel resolved the problem on my system.

Thanks @simon--taylor. Would you be able to test the tailscaled patch (instructions in https://github.com/tailscale/tailscale/issues/13041#issuecomment-2272209149) against the problematic kernel version/commit?

jwhited avatar Aug 06 '24 22:08 jwhited

I have also created a tailscale package with the fix applied, which you can install by doing the following:

sudo pacman -U https://pkgbuild.com/\~gromit/tailscale-1.70.0-1.1-x86_64.pkg.tar.zst

It is signed by my packager key, so all fine from the trust side if you're already using Arch Linux 😊

Edit: This is the output of the requested command below:

$ go version -m /usr/bin/tailscaled | grep wireguard-go
	dep	github.com/tailscale/wireguard-go	v0.0.0-20240806212730-80ee55ffe40c	h1:pH75gpj1SYEQdGK1gZ4VHwyZY/KOpG4btbol4jxTsoE=

christian-heusel avatar Aug 06 '24 22:08 christian-heusel

Mhh, I haven't seen anything like that in dmesg.

Since I just downgraded my LTS kernel and put it in hold, I tried the whole thing with the stable kernel and the error still occurs:

$ uname -a
Linux ciel 6.10.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 04 Aug 2024 05:11:32 +0000 x86_64 GNU/Linux
  • we know the bug is also present in 6.10.3, so the appearance of the issue on that kernel version is not surprising
  • if go(1) grabbed the correct version there, we don't expect to see the EINVALs from the writes
  • we expect to see dmesg lines when EINVAL is returned on this code path

so we'd love to check on some more details here to be sure.

I'm going to check 6.10.3 in my lab shortly, but in the meantime could you just confirm the output of:

go version -m ~/go/bin/tailscaled | grep wireguard-go

As you saw in the output our current build info strategy in tailscaled -version is missing information when built just by go(1) (I have a plan to fix this, but not today).

raggi avatar Aug 06 '24 23:08 raggi

$ go version -m ~/go/bin/tailscaled | grep wireguard-go
	dep	github.com/tailscale/wireguard-go	v0.0.0-20240806212730-80ee55ffe40c	h1:pH75gpj1SYEQdGK1gZ4VHwyZY/KOpG4btbol4jxTsoE=

marek22k avatar Aug 06 '24 23:08 marek22k

Also if you want to test the kernel side of things here are the linux and the linux-lts packages with https://github.com/gregkh/linux/commit/89add40066f9ed9abe5f7f886fe5789ff7e0c50e applied (which fixes the problem according to https://github.com/tailscale/tailscale/issues/13041#issuecomment-2272249663).

linux:

sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-6.10.3.arch1-2.1-x86_64.pkg.tar.zst

linux-lts:

sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-lts-6.6.44-3.1-x86_64.pkg.tar.zst

christian-heusel avatar Aug 06 '24 23:08 christian-heusel

Ok, there's still some additional behavior that we need to reproduce and understand here, I am not able to completely reproduce the case that @marek22k has yet.

I was able to reproduce (once) similar behavior on 6.6.44 where initial EINVAL reports from tailscaled do not (promptly) produce a log message in dmesg. There may be a second bug in the series that we need to track down. There is some related discussion on the original upstream thread, but the thread went cold.

We'll have to resume tomorrow.

raggi avatar Aug 07 '24 00:08 raggi

Thanks @simon--taylor. Would you be able to test the tailscaled patch (instructions in #13041 (comment)) against the problematic kernel version/commit?

The tailscale patch didn't resolve the issue for me. I found connecting to the Linux server from Windows over Tailscale using the VS Code remote SSH extension triggered the error more reliably than other network activity I tested. Issuing the Developer: Reload Window command caused a large number of error messages in the tailscaled log and took much longer to complete than normal.

simon--taylor avatar Aug 07 '24 15:08 simon--taylor

Also if you want to test the kernel side of things here are the linux and the linux-lts packages with gregkh/linux@89add40 applied (which fixes the problem according to #13041 (comment)).

linux:

sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-6.10.3.arch1-2.1-x86_64.pkg.tar.zst

linux-lts:

sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-lts-6.6.44-3.1-x86_64.pkg.tar.zst

They both fix the problem for me.

marek22k avatar Aug 07 '24 17:08 marek22k

https://github.com/jwhited/tun-einval-repro contains a simplified reproduction of the issue. This writes a GSO_TCPv4 packet to a TUN device w/GSO=1240 and 2 equal length segments.

6.10.2

jwhited@i5-12400-1:~$ uname -a && sudo ./tun-einval-repro
Linux i5-12400-1 6.10.2-061002-generic #202407271100 SMP PREEMPT_DYNAMIC Sat Jul 27 11:37:35 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
2024/08/07 18:10:24 successfully wrote packet

6.10.3

jwhited@i5-12400-2:~$ uname -a && sudo ./tun-einval-repro
Linux i5-12400-2 6.10.3-061003-generic #202408030740 SMP PREEMPT_DYNAMIC Sat Aug  3 07:53:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
2024/08/07 18:10:17 error writing packet to TUN: write /dev/net/tun: invalid argument

6.11-rc2

jwhited@i5-12400-1:~$ uname -a && sudo ./tun-einval-repro
Linux i5-12400-1 6.11.0-061100rc2-generic #202408042216 SMP PREEMPT_DYNAMIC Sun Aug  4 22:30:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
2024/08/07 18:23:48 successfully wrote packet
% git log --pretty=oneline v6.10.3...v6.10.2 | grep 'net: missing check virtio'
e9164903b8b303c34723177b02fe91e49e3c4cd7 net: missing check virtio
% git log --pretty=oneline v6.10.3...v6.10.2 | grep 'net: drop bad gso csum_start and offset in virtio_net_hdr'
%
% git log --pretty=oneline v6.11-rc2...v6.10.3 | grep 'net: drop bad gso csum_start and offset in virtio_net_hdr'
89add40066f9ed9abe5f7f886fe5789ff7e0c50e net: drop bad gso csum_start and offset in virtio_net_hdr
%

jwhited avatar Aug 07 '24 18:08 jwhited

I have now also poked the stable team to apply this to the stable trees: https://lore.kernel.org/all/[email protected]/

The issue is though that the fix does not apply cleanly to the 6.1 series, so if someone wants to provide a resolution for that this would be cool, I can take care of the submission then if that's of any help:

Click to see merge conflict
diff --cc include/linux/virtio_net.h
index 29b19d0a324c,6c395a2600e8..000000000000
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@@ -155,9 -156,27 +144,42 @@@ retry
  		unsigned int nh_off = p_off;
  		struct skb_shared_info *shinfo = skb_shinfo(skb);
  
 -		switch (gso_type & ~SKB_GSO_TCP_ECN) {
 -		case SKB_GSO_UDP:
 -			/* UFO may not include transport header in gso_size. */
 +		/* UFO may not include transport header in gso_size. */
 +		if (gso_type & SKB_GSO_UDP)
  			nh_off -= thlen;
++<<<<<<< HEAD
++||||||| parent of 89add40066f9 (net: drop bad gso csum_start and offset in virtio_net_hdr)
++			break;
++		case SKB_GSO_UDP_L4:
++			if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM))
++				return -EINVAL;
++			if (skb->csum_offset != offsetof(struct udphdr, check))
++				return -EINVAL;
++			if (skb->len - p_off > gso_size * UDP_MAX_SEGMENTS)
++				return -EINVAL;
++			if (gso_type != SKB_GSO_UDP_L4)
++				return -EINVAL;
++			break;
++		}
++=======
+ 			break;
+ 		case SKB_GSO_UDP_L4:
+ 			if (!(hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM))
+ 				return -EINVAL;
+ 			if (skb->csum_offset != offsetof(struct udphdr, check))
+ 				return -EINVAL;
+ 			if (skb->len - p_off > gso_size * UDP_MAX_SEGMENTS)
+ 				return -EINVAL;
+ 			if (gso_type != SKB_GSO_UDP_L4)
+ 				return -EINVAL;
+ 			break;
+ 		case SKB_GSO_TCPV4:
+ 		case SKB_GSO_TCPV6:
+ 			if (skb->csum_offset != offsetof(struct tcphdr, check))
+ 				return -EINVAL;
+ 			break;
+ 		}
++>>>>>>> 89add40066f9 (net: drop bad gso csum_start and offset in virtio_net_hdr)
  
  		/* Kernel has a special handling for GSO_BY_FRAGS. */
  		if (gso_size == GSO_BY_FRAGS)
diff --cc net/ipv4/udp_offload.c
index 794ea24292f6,bc8a9da750fe..000000000000
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@@ -280,6 -278,20 +280,34 @@@ struct sk_buff *__udp_gso_segment(struc
  	if (gso_skb->len <= sizeof(*uh) + mss)
  		return ERR_PTR(-EINVAL);
  
++<<<<<<< HEAD
++||||||| parent of 89add40066f9 (net: drop bad gso csum_start and offset in virtio_net_hdr)
++	if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
++		/* Packet is from an untrusted source, reset gso_segs. */
++		skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
++							     mss);
++		return NULL;
++	}
++
++	if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST)
++		return __udp_gso_segment_list(gso_skb, features, is_ipv6);
++
++=======
+ 	if (unlikely(skb_checksum_start(gso_skb) !=
+ 		     skb_transport_header(gso_skb)))
+ 		return ERR_PTR(-EINVAL);
+ 
+ 	if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
+ 		/* Packet is from an untrusted source, reset gso_segs. */
+ 		skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
+ 							     mss);
+ 		return NULL;
+ 	}
+ 
+ 	if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST)
+ 		return __udp_gso_segment_list(gso_skb, features, is_ipv6);
+ 
++>>>>>>> 89add40066f9 (net: drop bad gso csum_start and offset in virtio_net_hdr)
  	skb_pull(gso_skb, sizeof(*uh));
  
  	/* clear destructor to avoid skb_segment assigning it to tail */

christian-heusel avatar Aug 07 '24 18:08 christian-heusel

Update: We don't believe there is a simple workaround available as the buggy kernel commit results in EINVAL for GSO where all segments are equal in length. We had previously thought it was just affecting cases with shorter-than-GSO tail segments, but the reproduction posted 2 comments above disproves that.

The impact is much wider than Tailscale as this is likely impacting other users of virtio_net that utilize GSO, e.g. virtualization such as KVM, QEMU, gVisor, etc.

Selectively disabling GRO towards the TUN by kernel version or some probe-for-error logic is not ideal, but may be necessary depending on how widespread distribution of https://github.com/torvalds/linux/commit/e269d79c7d35aa3808b1f3c1737d63dab504ddc8 is absent its fix in https://github.com/torvalds/linux/commit/89add40066f9ed9abe5f7f886fe5789ff7e0c50e.

jwhited avatar Aug 07 '24 23:08 jwhited

You folks do any testing for packet loss after this? I see a ton after applying to 6.6.44 for an openwrt appliance. Heads up. The message is suppressed but the packets are still gone.

curtdept avatar Aug 08 '24 18:08 curtdept

$ uname -a
Linux ciel 6.6.44-3.1-lts #1 SMP PREEMPT_DYNAMIC Tue, 06 Aug 2024 23:09:53 +0000 x86_64 GNU/Linux
$ sudo ping -f -c1000 vpn-lu.uaru-vimba.ts.net
PING vpn-lu.uaru-vimba.ts.net (100.110.144.38) 56(84) bytes of data.
           
--- vpn-lu.uaru-vimba.ts.net ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 18152ms
rtt min/avg/max/mdev = 19.863/37.090/185.793/36.109 ms, pipe 11, ipg/ewma 18.170/21.747 ms
$ sudo ping -f -c1000 meyrin.uaru-vimba.ts.net
PING meyrin.uaru-vimba.ts.net (100.69.197.8) 56(84) bytes of data.
    
--- meyrin.uaru-vimba.ts.net ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 2686ms
rtt min/avg/max/mdev = 1.781/2.715/55.687/2.309 ms, pipe 4, ipg/ewma 2.688/3.092 ms
$ sudo ping -f -c1000 aurora.uaru-vimba.ts.net
PING aurora.uaru-vimba.ts.net (100.104.102.72) 56(84) bytes of data.
                                           
--- aurora.uaru-vimba.ts.net ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 15643ms
rtt min/avg/max/mdev = 137.426/149.168/710.629/60.473 ms, pipe 43, ipg/ewma 15.659/139.399 ms
$ sudo ping -f -c1000 herzstein.uaru-vimba.ts.net
PING herzstein.uaru-vimba.ts.net (100.72.64.53) 56(84) bytes of data.
                                                                 
--- herzstein.uaru-vimba.ts.net ping statistics ---
1000 packets transmitted, 1000 received, 0% packet loss, time 17023ms
rtt min/avg/max/mdev = 256.965/285.424/1077.871/103.334 ms, pipe 65, ipg/ewma 17.039/264.947 ms

Mhh, I don't see any increased packet loss now (after downloading the fixed kernel).

marek22k avatar Aug 08 '24 20:08 marek22k

You folks do any testing for packet loss after this? I see a ton after applying to 6.6.44 for an openwrt appliance. Heads up. The message is suppressed but the packets are still gone.

There are several patches in discussions here, and several kernel versions, so it's hard to determine what state you're testing.

There have been a lot of patches kernel side affecting offloading behavior, it's entirely possible some changes could result in triggering new code paths in drivers, so I wouldn't rule anything out, but first we need to be super clear what the case is.

What kernel version are you running? What patches are applied? What tailscale version are you running? What patches are applied? What test are you running? What are the results before and after a single change of one of the above factors?

raggi avatar Aug 08 '24 21:08 raggi

The patch made it to the stable queues:

  • https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-6.10/net-drop-bad-gso-csum_start-and-offset-in-virtio_net_hdr.patch
  • https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-6.6/net-drop-bad-gso-csum_start-and-offset-in-virtio_net_hdr.patch

Note that this still misses the 6.1 series (see https://github.com/tailscale/tailscale/issues/13041#issuecomment-2274132169 for the relevant conflict)

christian-heusel avatar Aug 14 '24 10:08 christian-heusel

As a follow up, after much effort, I was able to trace my issue down to a bad unrelated kernel patch, this bad gso patch tests good and it went in 6.6.46

curtdept avatar Aug 14 '24 22:08 curtdept

It looks like 6.10.5 patches this?

https://lwn.net/Articles/985670/

Willem de Bruijn (1): net: drop bad gso csum_start and offset in virtio_net_hdr

Fedora Testing Update tree: sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-cb22d42b54

StraightfaceStudios avatar Aug 16 '24 19:08 StraightfaceStudios

Preliminary testing with 6.10.5 (Fedora 40) appears to resolve our subnet router issues.

StraightfaceStudios avatar Aug 16 '24 22:08 StraightfaceStudios