quinn
quinn copied to clipboard
quinn-udp: EINVAL may happen even if GSO was previously working
I don't have many details yet on this problem but we are seeing some very strange behaviour which I think is related to very high loads (> 2Gbps) on the UDP socket.
In particular, we are hitting the following:
2025-04-14T05:10:04.725Z INFO quinn_udp::imp: `libc::sendmsg` failed with Invalid argument (os error 22); halting segmentation offload
However, "disabling segmentation offload" doesn't appear to fix it. Instead, the UDP socket seems to be "dead" and no transmissions of any kind go through:
2025-04-14T04:50:40.798Z DEBUG firezone_tunnel::sockets: Failed to send datagram: Failed to send datagram-batch with segment_size 88 to XXXX: Invalid argument (os error 22)
2025-04-14T04:50:41.616Z DEBUG firezone_tunnel::sockets: Failed to send datagram: Failed to send datagram-batch with segment_size 1312 to XXXX: Invalid argument (os error 22)
2025-04-14T04:50:41.798Z DEBUG firezone_tunnel::sockets: Failed to send datagram: Failed to send datagram-batch with segment_size 88 to XXXX: Invalid argument (os error 22)
2025-04-14T04:50:43.300Z DEBUG firezone_tunnel::sockets: Failed to send datagram: Failed to send datagram-batch with segment_size 88 to XXXX: Invalid argument (os error 22)
2025-04-14T04:50:44.801Z DEBUG firezone_tunnel::sockets: Failed to send datagram: Failed to send datagram-batch with segment_size 88 to XXXX: Invalid argument (os error 22)
2025-04-14T04:50:44.944Z DEBUG firezone_tunnel::sockets: Failed to send datagram: Failed to send datagram-batch with segment_size 1312 to XXXX: Invalid argument (os error 22)
2025-04-14T04:50:46.301Z DEBUG firezone_tunnel::sockets: Failed to send datagram: Failed to send datagram-batch with segment_size 88 to XXXX: Invalid argument (os error 22)
2025-04-14T04:50:47.802Z DEBUG firezone_tunnel::sockets: Failed to send datagram: Failed to send datagram-batch with segment_size 88 to XXXX: Invalid argument (os error 22)
I am taking into account max_gso_segments (here is the code: https://github.com/firezone/firezone/blob/b3746b330f27fd5a64f2ac47e3c9c12061390abb/rust/socket-factory/src/lib.rs#L263-L336) so I don't think that is the issue.
Has somebody already experienced this behaviour? I started reading through the Linux kernel code but EINVAL gets returned all over the place and none of them have logs so it is quite difficult to trace, what might be wrong.