tailscale icon indicating copy to clipboard operation
tailscale copied to clipboard

QUIC/H3 doesn't work over tailscale due to MTU size

Open psaab opened this issue 4 years ago • 15 comments

It seems that QUIC fails to work over tailscale when using the exit node. I've installed QUIC Indicator in chrome (https://chrome.google.com/webstore/detail/quic-indicator/dghiicffobcmljfmlaedpmpmgpnjcabc) and I never see websites I know that offer QUIC/H3 using the protocol, but I do when I turn off exit node. Quick test is:

Make sure QUIC Indicator is installed Turn on tailscale w/ exit node Go to https://cloudflare-quic.com/ Set exit node to None Go to https://cloudflare-quic.com/

psaab avatar Aug 11 '21 15:08 psaab

I can reproduce this (not using the Chrome extension, just seeing what https://cloudflare-quic.com/ says). With exit node enabled it is always HTTP/2. Without an exit node, after the first reload it is usually HTTP/3 though not always.

DentonGentry avatar Aug 12 '21 05:08 DentonGentry

Anybody try Firefox?

bradfitz avatar Aug 12 '21 05:08 bradfitz

I think this happens because of the MTU: https://groups.google.com/a/chromium.org/g/proto-quic/c/uKWLRh9JPCo?pli=1

QUIC handshake is 1350 bytes with no-fragment set. An MTU of less than 1350 bytes results in the QUIC handshake being dropped and falling back to HTTP/2.

DentonGentry avatar Aug 12 '21 05:08 DentonGentry

Using Firefox with no exit node, after a few refreshes I see it using HTTP/3.

Using Firefox with an exit node, the first several refreshes after enabling the exit node took a really long time but ultimately still said HTTP/3 (not sure if it even refreshed the page, though). After the first several very slow refreshes, subsequent refreshes were very quick but said HTTP/2.

Firefox may cache previous QUIC probe results? Maybe.

DentonGentry avatar Aug 12 '21 05:08 DentonGentry

Use of an exit node doesn't appear to be material to the issue, it is just the way to be able to use https://cloudflare-quic.com/ to test it.

DentonGentry avatar Aug 12 '21 06:08 DentonGentry

Following up about MTU: we've added a TS_DEBUG_MTU environment variable. If set before starting tailscaled, such as in /etc/default/tailscaled on Linux systems, the tailscale0 MTU will be set to this value. Setting TS_DEBUG_MTU to 1350 should allow QUIC/H3 to work.

DentonGentry avatar Oct 15 '21 17:10 DentonGentry

https://github.com/tailscale/tailscale/pull/5915 may also have fixed a case of this for Windows users, where there's an overlap of QUIC & IPv6.

raggi avatar Oct 15 '22 21:10 raggi

I don't want to be pushy, but as nothing was happening here for over an year: Is there a permanent & "non-expert" solution planned for this problem? Like increasing the MTU with an update for all so QUIC/UDP can go through (if possible)?

Zocker1999NET avatar Dec 14 '22 22:12 Zocker1999NET

https://github.com/tailscale/tailscale/pull/5915 first appeared in 1.34, released last week. Did it make a difference?

MTU discovery is one of the factors in network throughput, likely to be following on after https://tailscale.com/blog/throughput-improvements/ published yesterday.

DentonGentry avatar Dec 14 '22 22:12 DentonGentry

It didn't for me, but probably because I'm a Linux user. I tried the TS_DEBUG_MTU variable from above but it didn't worked out of the box, but I also hadn't the time to check what was causing the error for sure.

The upcoming autodetection sounds like it could solve this problem. And sounds like it might improve the overall performance a lot :+1: . Thanks for the fast answer.

Zocker1999NET avatar Dec 14 '22 22:12 Zocker1999NET

Update (2023/03/30):

Chrome 111.0.5563.146 - HTTP/3 Firefox 111.0.1 - HTTP/2

raggi avatar Mar 31 '23 00:03 raggi

Update (2024-04-23):

Firefox (125.0): Never works, hangs on reload when HTTP3 was used previously Chromium (124.0.6367.60): Works after reload curl (8.7.1, with nghttp3): Works with --http3

Is this still a TS bug or a Firefox one?

Note that this bug also affects direct connections in your tailnet.

Atemu avatar Apr 24 '24 13:04 Atemu

Same issue here. IPv6 works over the exit node (as in I can ping public IPv6 addresses) but connections to the internet always seem to use IPv4.

ghost avatar May 14 '24 13:05 ghost

@Fossil01 your bug is unrelated to this one.

Atemu avatar May 14 '24 15:05 Atemu

@Atemu not according to https://github.com/tailscale/tailscale/issues/8245

ghost avatar May 14 '24 15:05 ghost

Hi guys, author of quic-go here 👋

I'm running into a similar issue. Let me give a little bit of context from the QUIC side:

  • QUIC defines a minimum packet size (= UDP payload size) of 1200 bytes (see RFC 9000, section 8.1). This is a hard requirement. Servers will drop handshake packets smaller than this value.
  • Many QUIC stacks start with a larger value. This is reason people have been observing HTTP/3 connection failures. Large-scale measurements have shown that the vast majority of links that support 1200 bytes also support 1280 and even 13xx). Sending a large packet is advantageous for clients, since it reduces the effects of the anti-amplification limit (see RFC 9000, section 8), allowing the server to send a larger certificate chain in the first roundtrip.

Now you could argue that a QUIC stack should query the network interface for its MTU, but this might not be straightforward thing to do in practice. Things get more complicated when we look at various tunneling protocols that have recently been standardized in the MASQUE working group:

  • RFC 9298 defines how to tunnel UDP packets over HTTP. In practice, the most common use case is the tunneling of UDP packets over HTTP/3 (i.e. HTTP over QUIC), using QUIC Unreliable Datagrams (RFC 9221, RFC 9297). This is already deployed on an internet-wide scale: This is how iCloud Private Relay works (with 2 layers of tunneling).
  • Of course, the most common type of UDP payload getting proxied is QUIC itself, and this is where we're slowly running into MTU problems: Each layer adds overhead, both for the QUIC header, and for QUIC and HTTP datagram framing. How many bytes depends on a variety of factors, including QUIC connection IDs, packet number lengths etc.
  • There are also other tunneling protocols, including RFC 9484, which defines a way to proxy IP packets over HTTP. Of course, this adds even more framing overhead.

Long story short: 1280 bytes is uncomfortably close to QUIC's minimum packet size limit when tunneling QUIC connections.

I'm probably missing a whole bunch of context here, but I'm not sure why the Tailscale defines an MTU at all. Would it be possible to allow larger MTUs (1500, or close to), and risk packet drop in the network, similar to how my local network interface doesn't prevent me from sending a 1470 UDP byte packet (there's just a high likelihood that it won't arrive at the destination). Applications could then run PMTUD.

If that's not possible, I'm wondering if there's anything we can learn from the QUIC deployment experience of companies like Google and Meta, who use quite large MTUs, based on data that this is generally safe over the vast majority of internet links. It might be feasible to increase the MTU to somewhere around 1350 bytes. This would go a long way in enabling QUIC tunneling applications.

marten-seemann avatar Sep 18 '24 11:09 marten-seemann

I think I'm running into this on MacOS. My mac book running Sonoma 14.5, Tailscale App v1.68.1, cannot hear quic-go traffic to and from itself (on different ports of the same 100.x.x.x tailscale IP address assigned to the mac book). This is with no exit node in use.

And, perhaps not surprisingly, no quic-go traffic is arriving between my mac and a linux node on the tailscale network. It traffic won't flow on the same machine, I cannot see how it can get off the machine.

The same quic-go based code on Linux with Tailscale v1.20.4 does work intra-host, but also does not talk inter-hosts. Not mac <-> linux, and not linux <-> linux (different linux hosts).

Are there options that can be set to try things?

Update: I was able to use bisection on the same mac <-> same mac communicaton over quic-go, to find that by raising to MTU from 1280 to 1308, then intra-mac communication is possible. Whereas with an MTU of 1307, no quic-go UDP communication can be started.

# here is how the MTU looked when I started:
$ utun4: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280

$ sudo ifconfig utun4 mtu 1307 ## still does not work

$ sudo ifconfig utun4 mtu 1308  ## makes quic-go work (where utun4 is Tailsacle 100.x.x.x address on macOS).

The same MTU of 1308 also allows the macOS <-> linux to communicate, but only if the MTU is raised on both sides.

I'm not a network guru. Can someone explain the downsides of such a solution / why hasn't Tailscale raised the default MTU? This is a 28 byte difference.

Update: I was able to get quic-go to work over Tailscale IPv4 addresses by setting quic.Config.InitialPacketSize = 1200; the minimum. After doing some reading, I assume that Tailscale defaults to an MTU of 1280 because for (some? all?) IPv6 networks, that is all they handle. The 1200 initial packet size didn't work for IPv6 on my mac or linux systems, but the IPv4 suffices for me.

glycerine avatar Oct 11 '24 01:10 glycerine

This is still a problem right now, preventing e.g. Syncthing's QUIC transport (1.27.x, between two run-of-the-mill Linux hosts) from working over Tailscale (1.76.0, go1.23.2).

intelfx avatar Oct 11 '24 13:10 intelfx

@intelfx did you try my suggestion? i.e. at https://github.com/syncthing/syncthing/blob/main/lib/connections/quic_misc.go#L28 add the InitialPacketSize = 1200 option, for example as in https://github.com/glycerine/rpc25519/blob/master/quic_client.go#L82

glycerine avatar Oct 11 '24 16:10 glycerine

@glycerine I didn't feel like compiling syncthing for 3 arches that I have on hand, so I set TS_DEBUG_MTU=1350 instead :-)

intelfx avatar Oct 11 '24 16:10 intelfx

Thanks for the comments @marten-seemann !

Related issues https://github.com/tailscale/tailscale/issues/311 and https://github.com/tailscale/tailscale/issues/8102.

Yes, we need to implement PMTU behaviors, both on the inside layer and outer layer. On the inside we need to be producing ICMP at the appropriate times where the OS will not do so (many TUNs will only do so in only one direction), and additionally to implement our own knowledge of and hunt for appropriate peer MTU, as well as associated optimizations such as appropriately populating the platform PMTU cache appropriately to avoid hunts in environments with wildly different MTUs (e.g. AWS EC2 nodes that often have a 9001 interface MTU, but only 1500 byte MTU to the internet).

The path isn't short, as similar to QUIC is experiencing, ensuring that all hops / transforms on a particular end to end path are communicating appropriate signals / gracefully handling the cases when they are not is a broad set of challenges.

I'd love to take a suggestion such as https://github.com/tailscale/tailscale/issues/2633#issuecomment-2358219952 and just raise the default MTU to 1350 or so, but a number of the existing issues still arise immediately making it less trivial than it might appear. Without manifesting ICMP appropriately in both directions, on all platforms, a change to the default MTU would create incompatibilities between client versions. This is all doable, I'm just providing some insight as to why this is a much larger change than it might initially appear, particularly as some users can get by if they set TS_DEBUG_MTU=1350 consistently on every one of their nodes. The problem comes with the one they miss, or the shared node they add to their network, etc.

I'd like to ask @marten-seemann as it isn't immediately clear to me. You discussed wrapped tunnels a number of times, but is there a problem today with a single standalone quic-go stream? Would a "plain" quic-go server and, say, curl successfully communicate over Tailscale, or not? If not, what specifically is the root cause there?

raggi avatar Oct 31 '24 00:10 raggi

Hi @raggi Marten will have a much broader perspective, but my personal experience is: I can get quic-go to work over Tailscale IPv4 to work by setting quic.Config.InitialPacketSize=1200. The quic-go keepalives are not reliable (might or might not be related to Tailscale; https://github.com/quic-go/quic-go/issues/4710) but otherwise it does work just fine (and I leave the MTU at the default 1280). You can use srv -q -s 100.x.x.x and cli -q -s 100.x.x.x from my https://github.com/glycerine/rpc25519/tree/master/cmd to test (see the readme there to install/make the necessary certs).

glycerine avatar Oct 31 '24 01:10 glycerine