sing-box icon indicating copy to clipboard operation
sing-box copied to clipboard

singbox 发送了错误的 maxDatagramFrameSize,导致 hysteria2 服务端错误的进行分包

Open Zxneric opened this issue 1 year ago • 24 comments

操作系统

Windows

系统版本

10.0.22631 Build 22631

安装类型

sing-box 原始命令行程序

如果您使用图形客户端程序,请提供该程序版本。

No response

版本

sing-box version 1.9.0-rc.4

Environment: go1.22.1 windows/amd64
Tags: with_gvisor,with_quic,with_dhcp,with_wireguard,with_ech,with_utls,with_reality_server,with_acme,with_clash_api
Revision: dce6053c3f37e2e578de4af106f3db0edf24e73d
CGO: disabled

描述

使用 singbox 连接到 hysteria2 outbound 时,singbox 内部无法正确对 udp 进行分包,udp 包大小超过 quic datagram 容量,导致数据丢失。

此错误可能由

https://github.com/SagerNet/sing-quic/blob/6be1f3c03ae882d544677f1cc0f43297e4afa6b1/hysteria2/packet.go#L239-L253

导致

重现方式

按照以下配置搭建服务以后

curl --http3-only -vvv https://quic.nginx.org/test/

输出

* Host quic.nginx.org:443 was resolved.
* IPv6: (none)
* IPv4: 35.214.218.230
*   Trying 35.214.218.230:443...
* QUIC cipher selection: TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_CCM_SHA256
*  CAfile: curl-ca-bundle.crt
*  CApath: none
* ngtcp2_conn_handle_expiry returned error: ERR_HANDSHAKE_TIMEOUT
* Failed to connect to quic.nginx.org port 443 after 10045 ms: Failed sending data to the peer
* Closing connection
curl: (55) ngtcp2_conn_handle_expiry returned error: ERR_HANDSHAKE_TIMEOUT

sing-box client 配置文件

{
  "dns": {
    "servers": [
      {
        "tag": "google",
        "detour": "select",
        "address": "tls://8.8.8.8"
      },
      {
        "tag": "local",
        "address": "114.114.114.114",
        "detour": "direct"
      }
    ],
    "rules": [
      {
        "outbound": "any",
        "server": "local"
      },
    ],
    "independent_cache": true,
    "final": "google",
    "strategy": "ipv4_only"
  },
  "inbounds": [
    {
      "auto_route": true,
      "interface_name": "singbox",
      "inet4_address": "172.19.0.1/30",
      "type": "tun",
      "stack": "gvisor",
      "endpoint_independent_nat": true,
      "mtu": 1400,
      "strict_route": true,
      "sniff": true,
      "sniff_timeout": "500ms",
      "udp_disable_domain_unmapping": true
    }
  ],
  "outbounds": [
    // any hysteria2 outbound
  ],
  "route": {
    "auto_detect_interface": true,
    "final": "select"
  }
}

hysteria 服务端配置文件

listen: :443

tls:
  cert: /etc/hysteria/cert.crt
  key: /etc/hysteria/private.key

quic:
  initStreamReceiveWindow: 16777216
  maxStreamReceiveWindow: 16777216
  initConnReceiveWindow: 33554432
  maxConnReceiveWindow: 33554432

auth:
  type: password
  password: 

masquerade:
  type: proxy
  proxy:
    url: https://example.com
    rewriteHost: true

日志

DEBUG[0184] [764637597 0ms] inbound/tun[0]: connection closed: upload: DATAGRAM frame too large (maximum: 1200 bytes) | download: io: read/write on closed pipe
DEBUG[0187] [1280231848 10.2s] inbound/tun[0]: connection closed: io: read/write on closed pipe | upstream: context canceled
DEBUG[0194] [2325380955 16.55s] inbound/tun[0]: connection closed: download: use of closed network connection

支持我们

完整性要求

  • [X] 我保证阅读了文档,了解所有我编写的配置文件项的含义,而不是大量堆砌看似有用的选项或默认值。
  • [X] 我保证提供了可以在本地重现该问题的服务器、客户端配置文件与流程,而不是一个脱敏的复杂客户端配置文件。
  • [X] 我保证提供了可用于重现我报告的错误的最简配置,而不是依赖远程服务器、TUN、图形界面客户端或者其他闭源软件。
  • [X] 我保证提供了完整的配置文件与日志,而不是出于对自身智力的自信而仅提供了部分认为有用的部分。

Zxneric avatar Apr 07 '24 09:04 Zxneric

有两个主要错误

第一个是

https://github.com/SagerNet/sing-quic/blob/619122b4c233d46dd1e2c3262df303e8a9adf6d1/hysteria2/packet.go#L87

fragUDPMessage 的 maxPacketSize 只传入过 udpMtu 或者 quic-go 抛出的 max frame size,因此比较时,应该减去 headerSize

第二个是

https://github.com/SagerNet/sing-quic/blob/619122b4c233d46dd1e2c3262df303e8a9adf6d1/hysteria2/packet.go#L210

tooLargeErr.PeerMaxDatagramFrameSize 不能正确反映可用的空间,正确的值应该是 *DatagramFrame.MaxDataLen 返回的值。在当前版本下,可以直接 -3

Zxneric avatar Apr 07 '24 11:04 Zxneric

应用了以上两个修复后,确认hysteria2下的udp工作正常

Zxneric avatar Apr 07 '24 12:04 Zxneric

关于第二点,hysteria 官方实现选择在fork中暴露内部的可用 buf 大小

https://github.com/apernet/quic-go/blob/fef64abc6b78d476489718256f19e3d871f0707f/connection.go#L2368-L2373

Zxneric avatar Apr 07 '24 17:04 Zxneric

https://github.com/SagerNet/sing-quic/blob/6be1f3c03ae882d544677f1cc0f43297e4afa6b1/hysteria2/packet.go#L364-L370

如果有两个分段数相同,会导致缓存污染

Zxneric avatar Apr 07 '24 17:04 Zxneric

https://github.com/SagerNet/sing-quic/blob/6be1f3c03ae882d544677f1cc0f43297e4afa6b1/hysteria2/packet.go#L364-L370

如果有两个分段数相同,会导致缓存污染

udpDefragger is unique by sessionID

nekohasekai avatar Apr 08 '24 02:04 nekohasekai

关于第二点,hysteria 官方实现选择在fork中暴露内部的可用 buf 大小

https://github.com/apernet/quic-go/blob/fef64abc6b78d476489718256f19e3d871f0707f/connection.go#L2368-L2373

This seems to come from quic-go rather than hysteria, which is a fixed value.

nekohasekai avatar Apr 08 '24 02:04 nekohasekai

关于第二点,hysteria 官方实现选择在fork中暴露内部的可用 buf 大小 https://github.com/apernet/quic-go/blob/fef64abc6b78d476489718256f19e3d871f0707f/connection.go#L2368-L2373

This seems to come from quic-go rather than hysteria, which is a fixed value.

it did came from quic-go, thus need a modification in the fork, or just minus 3 for the current version

Zxneric avatar Apr 08 '24 04:04 Zxneric

something still confused me in the hysteria2 outbound is dtls hanshake cannot be finished. Client Hello -> Hello Verify -> Client Hello, then the Server Hello always be dropped. It didn't happened in the official hysteria implementation.

Zxneric avatar Apr 08 '24 04:04 Zxneric

it did came from quic-go, thus need a modification in the fork

hysteria fixes the maximum mtu of quic-go to 1200, so this judgment is needed, while sing-box and clash.meta cannot

nekohasekai avatar Apr 08 '24 04:04 nekohasekai

After apply the patch in SagerNet/sing-quic#8, I confirmed the udp works corretly if the packet length less than 1390. Still needs the further work on this.

server.py

import socket
import argparse

def main(server_ip, server_port):
    with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as server_socket:
        server_socket.bind((server_ip, server_port))
        print(f"Server is listening on {server_ip}:{server_port}")

        while True:
            data, client_address = server_socket.recvfrom(10000)
            print(f"Received data from {client_address}: {len(data)}")

            server_socket.sendto(data, client_address)
            print(f"Sent data back to {client_address}: {len(data)}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="UDP server with command line arguments")
    parser.add_argument("--server_ip", default="0.0.0.0", help="Server IP address")
    parser.add_argument("--port", default=45793, type=int, help="Server port number")
    args = parser.parse_args()

    main(args.server_ip, args.port)

client.py

import os
import socket


server = None
port = None

def send_and_receive(n):
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    s.connect((server, port))
    s.settimeout(1)
    
    data = os.urandom(n)

    s.sendall(data)

    rdata = s.recv(n)

    assert data == rdata
    return data


if __name__ == "__main__":
    for i in range(1190, 3000):
        print("sending and receiving", i, "bytes")
        send_and_receive(i)
        print(i, "bytes received")

Zxneric avatar Apr 11 '24 20:04 Zxneric

SagerNet/sing-quic#8 is wrong and not related to the real bug

Zxneric avatar Apr 12 '24 16:04 Zxneric

SagerNet/sing-quic#9 fixes this. But the same bug may also happened to hysteria1 and tuic. I have no idea about those protocol. Please also update them. @nekohasekai @dyhkwong plz review

Zxneric avatar Apr 14 '24 19:04 Zxneric

And it seems that mtu discovery never worked properly

Zxneric avatar Apr 14 '24 19:04 Zxneric

@nekohasekai 此错误在 1.9.0-rc11 中被修复,但是在 rc12 中再次出现 force push sing-quic 时,丢弃了 https://github.com/SagerNet/sing-quic/pull/9/commits/6a7930915e8150bbe732d85fe9cb8a1cd174017a 中的补丁。 设置此参数可以保证服务端工作正常,尤其是与其他实现的服务端互操作时,仅仅设置 udpMtu 只能保证客户端行为

Zxneric avatar Apr 26 '24 17:04 Zxneric

@nekohasekai I would like to know how you plan to fix this issue? Are there any planned patches? Please let me know if you have any comments on my fix.

Zxneric avatar May 04 '24 00:05 Zxneric

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Jul 03 '24 01:07 github-actions[bot]

此问题在最新版本仍然存在

Zxneric avatar Jul 03 '24 05:07 Zxneric

https://github.com/SagerNet/sing-quic/blob/fe8564f7e0b2b0d184aea0368fbdba8a83781b89/hysteria2/packet.go#L138

udpMTU 已经是 1200 - 3

fragUDPMessage 的 maxPacketSize 只传入过 udpMtu 或者 quic-go 抛出的 max frame size,因此比较时,应该减去 headerSize

https://github.com/SagerNet/sing-quic/blob/fe8564f7e0b2b0d184aea0368fbdba8a83781b89/hysteria2/packet.go#L87

tooLargeErr.PeerMaxDatagramFrameSize 不能正确反映可用的空间,正确的值应该是 *DatagramFrame.MaxDataLen 返回的值。在当前版本下,可以直接 -3

https://github.com/SagerNet/sing-quic/blob/fe8564f7e0b2b0d184aea0368fbdba8a83781b89/hysteria2/packet.go#L206

以及:https://github.com/SagerNet/sing-box/issues/1650#issuecomment-2041857226

此问题在最新版本仍然存在

所以现在的问题是什么呢?

dyhkwong avatar Jul 03 '24 11:07 dyhkwong

问题出在 https://github.com/SagerNet/quic-go/blob/cbbf722db9215d5ebf2f53ed43c804e71a71346c/connection.go#L316-L320, 如果不设置 config.MaxDatagramFrameSize, 会导致连接的 MaxDatagramFrameSize 被默认设置为16384,在这种情况下,对端如果发送了 加上hysteria header以后的包大小 大于 路径MTU 的包,会导致直接被丢弃。 如果服务端也是sing-box的话是没有这个问题的,因为sing-box在发送之前写死了检查大小的逻辑,但是如果和官方实现通信的话这里就bug了。

Zxneric avatar Jul 03 '24 11:07 Zxneric

MaxDatagramFrameSize 在原本的 quic-go 中是不存在的,是被一系列patch加上的。 hysteria官方实现没有设置这个值是因为 wire.MaxDatagramSize 被改成了 1200,所以不会有问题。但是 sing-box 回滚了这个patch,因此需要手动设置这个值来通知对端正确的大小。 https://github.com/SagerNet/sing-quic/pull/10 我之前提的 PR 被拒了

Zxneric avatar Jul 03 '24 11:07 Zxneric

在这种情况下,对端如果发送了 加上hysteria header以后的包大小 大于 路径MTU 的包,会导致直接被丢弃。

请重新组织语言描述你的问题,hy2 自己限制 1200,不可能发送大于此的包。

如果你只知道存在问题与掩盖问题的方法,请不要添油加醋。

nekohasekai avatar Jul 03 '24 13:07 nekohasekai

1200是默认的设置,这个数值会在运行时被改变,我在hysteria的官方实现打log确认的。应该是和路径发现有关。 我确实没有弄清楚完整的整个逻辑链,但是这里肯定有个bug是确定的

Zxneric avatar Jul 03 '24 14:07 Zxneric

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Sep 02 '24 01:09 github-actions[bot]

update

Zxneric avatar Sep 02 '24 04:09 Zxneric