singbox 发送了错误的 maxDatagramFrameSize,导致 hysteria2 服务端错误的进行分包
操作系统
Windows
系统版本
10.0.22631 Build 22631
安装类型
sing-box 原始命令行程序
如果您使用图形客户端程序,请提供该程序版本。
No response
版本
sing-box version 1.9.0-rc.4
Environment: go1.22.1 windows/amd64
Tags: with_gvisor,with_quic,with_dhcp,with_wireguard,with_ech,with_utls,with_reality_server,with_acme,with_clash_api
Revision: dce6053c3f37e2e578de4af106f3db0edf24e73d
CGO: disabled
描述
使用 singbox 连接到 hysteria2 outbound 时,singbox 内部无法正确对 udp 进行分包,udp 包大小超过 quic datagram 容量,导致数据丢失。
此错误可能由
https://github.com/SagerNet/sing-quic/blob/6be1f3c03ae882d544677f1cc0f43297e4afa6b1/hysteria2/packet.go#L239-L253
导致
重现方式
按照以下配置搭建服务以后
curl --http3-only -vvv https://quic.nginx.org/test/
输出
* Host quic.nginx.org:443 was resolved.
* IPv6: (none)
* IPv4: 35.214.218.230
* Trying 35.214.218.230:443...
* QUIC cipher selection: TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_CCM_SHA256
* CAfile: curl-ca-bundle.crt
* CApath: none
* ngtcp2_conn_handle_expiry returned error: ERR_HANDSHAKE_TIMEOUT
* Failed to connect to quic.nginx.org port 443 after 10045 ms: Failed sending data to the peer
* Closing connection
curl: (55) ngtcp2_conn_handle_expiry returned error: ERR_HANDSHAKE_TIMEOUT
sing-box client 配置文件
{
"dns": {
"servers": [
{
"tag": "google",
"detour": "select",
"address": "tls://8.8.8.8"
},
{
"tag": "local",
"address": "114.114.114.114",
"detour": "direct"
}
],
"rules": [
{
"outbound": "any",
"server": "local"
},
],
"independent_cache": true,
"final": "google",
"strategy": "ipv4_only"
},
"inbounds": [
{
"auto_route": true,
"interface_name": "singbox",
"inet4_address": "172.19.0.1/30",
"type": "tun",
"stack": "gvisor",
"endpoint_independent_nat": true,
"mtu": 1400,
"strict_route": true,
"sniff": true,
"sniff_timeout": "500ms",
"udp_disable_domain_unmapping": true
}
],
"outbounds": [
// any hysteria2 outbound
],
"route": {
"auto_detect_interface": true,
"final": "select"
}
}
hysteria 服务端配置文件
listen: :443
tls:
cert: /etc/hysteria/cert.crt
key: /etc/hysteria/private.key
quic:
initStreamReceiveWindow: 16777216
maxStreamReceiveWindow: 16777216
initConnReceiveWindow: 33554432
maxConnReceiveWindow: 33554432
auth:
type: password
password:
masquerade:
type: proxy
proxy:
url: https://example.com
rewriteHost: true
日志
DEBUG[0184] [764637597 0ms] inbound/tun[0]: connection closed: upload: DATAGRAM frame too large (maximum: 1200 bytes) | download: io: read/write on closed pipe
DEBUG[0187] [1280231848 10.2s] inbound/tun[0]: connection closed: io: read/write on closed pipe | upstream: context canceled
DEBUG[0194] [2325380955 16.55s] inbound/tun[0]: connection closed: download: use of closed network connection
支持我们
- [ ] 我已经 赞助
完整性要求
- [X] 我保证阅读了文档,了解所有我编写的配置文件项的含义,而不是大量堆砌看似有用的选项或默认值。
- [X] 我保证提供了可以在本地重现该问题的服务器、客户端配置文件与流程,而不是一个脱敏的复杂客户端配置文件。
- [X] 我保证提供了可用于重现我报告的错误的最简配置,而不是依赖远程服务器、TUN、图形界面客户端或者其他闭源软件。
- [X] 我保证提供了完整的配置文件与日志,而不是出于对自身智力的自信而仅提供了部分认为有用的部分。
有两个主要错误
第一个是
https://github.com/SagerNet/sing-quic/blob/619122b4c233d46dd1e2c3262df303e8a9adf6d1/hysteria2/packet.go#L87
fragUDPMessage 的 maxPacketSize 只传入过 udpMtu 或者 quic-go 抛出的 max frame size,因此比较时,应该减去 headerSize
第二个是
https://github.com/SagerNet/sing-quic/blob/619122b4c233d46dd1e2c3262df303e8a9adf6d1/hysteria2/packet.go#L210
tooLargeErr.PeerMaxDatagramFrameSize 不能正确反映可用的空间,正确的值应该是 *DatagramFrame.MaxDataLen 返回的值。在当前版本下,可以直接 -3
应用了以上两个修复后,确认hysteria2下的udp工作正常
关于第二点,hysteria 官方实现选择在fork中暴露内部的可用 buf 大小
https://github.com/apernet/quic-go/blob/fef64abc6b78d476489718256f19e3d871f0707f/connection.go#L2368-L2373
https://github.com/SagerNet/sing-quic/blob/6be1f3c03ae882d544677f1cc0f43297e4afa6b1/hysteria2/packet.go#L364-L370
如果有两个分段数相同,会导致缓存污染
https://github.com/SagerNet/sing-quic/blob/6be1f3c03ae882d544677f1cc0f43297e4afa6b1/hysteria2/packet.go#L364-L370
如果有两个分段数相同,会导致缓存污染
udpDefragger is unique by sessionID
关于第二点,hysteria 官方实现选择在fork中暴露内部的可用 buf 大小
https://github.com/apernet/quic-go/blob/fef64abc6b78d476489718256f19e3d871f0707f/connection.go#L2368-L2373
This seems to come from quic-go rather than hysteria, which is a fixed value.
关于第二点,hysteria 官方实现选择在fork中暴露内部的可用 buf 大小 https://github.com/apernet/quic-go/blob/fef64abc6b78d476489718256f19e3d871f0707f/connection.go#L2368-L2373
This seems to come from quic-go rather than hysteria, which is a fixed value.
it did came from quic-go, thus need a modification in the fork, or just minus 3 for the current version
something still confused me in the hysteria2 outbound is dtls hanshake cannot be finished. Client Hello -> Hello Verify -> Client Hello, then the Server Hello always be dropped. It didn't happened in the official hysteria implementation.
it did came from quic-go, thus need a modification in the fork
hysteria fixes the maximum mtu of quic-go to 1200, so this judgment is needed, while sing-box and clash.meta cannot
After apply the patch in SagerNet/sing-quic#8, I confirmed the udp works corretly if the packet length less than 1390. Still needs the further work on this.
server.py
import socket
import argparse
def main(server_ip, server_port):
with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as server_socket:
server_socket.bind((server_ip, server_port))
print(f"Server is listening on {server_ip}:{server_port}")
while True:
data, client_address = server_socket.recvfrom(10000)
print(f"Received data from {client_address}: {len(data)}")
server_socket.sendto(data, client_address)
print(f"Sent data back to {client_address}: {len(data)}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="UDP server with command line arguments")
parser.add_argument("--server_ip", default="0.0.0.0", help="Server IP address")
parser.add_argument("--port", default=45793, type=int, help="Server port number")
args = parser.parse_args()
main(args.server_ip, args.port)
client.py
import os
import socket
server = None
port = None
def send_and_receive(n):
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.connect((server, port))
s.settimeout(1)
data = os.urandom(n)
s.sendall(data)
rdata = s.recv(n)
assert data == rdata
return data
if __name__ == "__main__":
for i in range(1190, 3000):
print("sending and receiving", i, "bytes")
send_and_receive(i)
print(i, "bytes received")
SagerNet/sing-quic#8 is wrong and not related to the real bug
SagerNet/sing-quic#9 fixes this. But the same bug may also happened to hysteria1 and tuic. I have no idea about those protocol. Please also update them. @nekohasekai @dyhkwong plz review
And it seems that mtu discovery never worked properly
@nekohasekai 此错误在 1.9.0-rc11 中被修复,但是在 rc12 中再次出现 force push sing-quic 时,丢弃了 https://github.com/SagerNet/sing-quic/pull/9/commits/6a7930915e8150bbe732d85fe9cb8a1cd174017a 中的补丁。 设置此参数可以保证服务端工作正常,尤其是与其他实现的服务端互操作时,仅仅设置 udpMtu 只能保证客户端行为
@nekohasekai I would like to know how you plan to fix this issue? Are there any planned patches? Please let me know if you have any comments on my fix.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days
此问题在最新版本仍然存在
https://github.com/SagerNet/sing-quic/blob/fe8564f7e0b2b0d184aea0368fbdba8a83781b89/hysteria2/packet.go#L138
udpMTU 已经是 1200 - 3
fragUDPMessage 的 maxPacketSize 只传入过 udpMtu 或者 quic-go 抛出的 max frame size,因此比较时,应该减去 headerSize
https://github.com/SagerNet/sing-quic/blob/fe8564f7e0b2b0d184aea0368fbdba8a83781b89/hysteria2/packet.go#L87
tooLargeErr.PeerMaxDatagramFrameSize不能正确反映可用的空间,正确的值应该是*DatagramFrame.MaxDataLen返回的值。在当前版本下,可以直接 -3
https://github.com/SagerNet/sing-quic/blob/fe8564f7e0b2b0d184aea0368fbdba8a83781b89/hysteria2/packet.go#L206
以及:https://github.com/SagerNet/sing-box/issues/1650#issuecomment-2041857226
此问题在最新版本仍然存在
所以现在的问题是什么呢?
问题出在 https://github.com/SagerNet/quic-go/blob/cbbf722db9215d5ebf2f53ed43c804e71a71346c/connection.go#L316-L320, 如果不设置 config.MaxDatagramFrameSize, 会导致连接的 MaxDatagramFrameSize 被默认设置为16384,在这种情况下,对端如果发送了 加上hysteria header以后的包大小 大于 路径MTU 的包,会导致直接被丢弃。 如果服务端也是sing-box的话是没有这个问题的,因为sing-box在发送之前写死了检查大小的逻辑,但是如果和官方实现通信的话这里就bug了。
MaxDatagramFrameSize 在原本的 quic-go 中是不存在的,是被一系列patch加上的。 hysteria官方实现没有设置这个值是因为 wire.MaxDatagramSize 被改成了 1200,所以不会有问题。但是 sing-box 回滚了这个patch,因此需要手动设置这个值来通知对端正确的大小。 https://github.com/SagerNet/sing-quic/pull/10 我之前提的 PR 被拒了
在这种情况下,对端如果发送了 加上hysteria header以后的包大小 大于 路径MTU 的包,会导致直接被丢弃。
请重新组织语言描述你的问题,hy2 自己限制 1200,不可能发送大于此的包。
如果你只知道存在问题与掩盖问题的方法,请不要添油加醋。
1200是默认的设置,这个数值会在运行时被改变,我在hysteria的官方实现打log确认的。应该是和路径发现有关。 我确实没有弄清楚完整的整个逻辑链,但是这里肯定有个bug是确定的
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days
update