dpvs
dpvs copied to clipboard
defence_tcp_drop关闭的情况下vip的部分tcp流量到了kni接口
场景描述
- 双臂
fullnat
模式 - 普通的http服务,客户端异常时候会产生大量类似
syn flood
的包,部分包全部透传到kni
接口,大概300k pps。导致BGP中断,健康检查出现中断的现象。从代码看到vip:vport
的流量不应该走到kni
接口才对
复现方法
- 关键是
dpvs.conf
关闭defence_tcp_drop
,测试1.6.1
和1.8.4
都可以复现。1.7版本开始默认关闭,这个集群使用的是1.8.4
版本 - vip 开启
synproxy
- 客户端发包
# -c 指定次数, -S flag-syn -p 端口 -i 发包频率
hping3 -V -c 10000 -S -p 80 -i u100000 vip
- 现象,包的时间戳忽略,不对应。
# client 发出的包
11:18:19.768999 IP (tos 0x0, ttl 64, id 37355, offset 0, flags [none], proto TCP (6), length 40)
client.2254 > vip.80: Flags [S], cksum 0x2582 (correct), seq 554031943, win 512, length 0
11:18:19.769021 IP (tos 0x0, ttl 62, id 37355, offset 0, flags [none], proto TCP (6), length 40)
vip.80 > client.2254: Flags [S.], cksum 0xe406 (correct), seq 3530883126, ack 554031944, win 512, length 0
11:18:19.769030 IP (tos 0x0, ttl 64, id 39383, offset 0, flags [DF], proto TCP (6), length 40)
client.2254 > vip.80: Flags [R], cksum 0xb8c0 (correct), seq 554031944, win 0, length 0
11:18:19.869014 IP (tos 0x0, ttl 64, id 18676, offset 0, flags [none], proto TCP (6), length 40)
client.2255 > vip.80: Flags [S], cksum 0x3043 (correct), seq 2130215353, win 512, length 0
11:18:19.869033 IP (tos 0x0, ttl 62, id 18676, offset 0, flags [none], proto TCP (6), length 40)
vip.80 > client.2255: Flags [S.], cksum 0x8e76 (correct), seq 673053624, ack 2130215354, win 512, length 0
11:18:19.869042 IP (tos 0x0, ttl 64, id 39384, offset 0, flags [DF], proto TCP (6), length 40)
client.2255 > vip.80: Flags [R], cksum 0xb45a (correct), seq 2130215354, win 0, length 0
11:18:19.969021 IP (tos 0x0, ttl 64, id 52270, offset 0, flags [none], proto TCP (6), length 40)
client.2256 > vip.80: Flags [S], cksum 0x555c (correct), seq 943043500, win 512, length 0
11:18:19.969040 IP (tos 0x0, ttl 62, id 52270, offset 0, flags [none], proto TCP (6), length 40)
vip.80 > client.2256: Flags [S.], cksum 0xbc86 (correct), seq 2676517644, ack 943043501, win 512, length 0
11:18:19.969045 IP (tos 0x0, ttl 64, id 39385, offset 0, flags [DF], proto TCP (6), length 40)
client.2256 > vip.80: Flags [R], cksum 0xc929 (correct), seq 943043501, win 0, length 0
11:18:20.069035 IP (tos 0x0, ttl 64, id 17297, offset 0, flags [none], proto TCP (6), length 40)
client.2257 > vip.80: Flags [S], cksum 0x3513 (correct), seq 483835533, win 512, length 0
11:18:20.069054 IP (tos 0x0, ttl 62, id 17297, offset 0, flags [none], proto TCP (6), length 40)
vip.80 > client.2257: Flags [S.], cksum 0xd814 (correct), seq 969786806, ack 483835534, win 512, length 0
11:18:20.069059 IP (tos 0x0, ttl 64, id 39386, offset 0, flags [DF], proto TCP (6), length 40)
client.2257 > vip.80: Flags [R], cksum 0xd9a6 (correct), seq 483835534, win 0, length 0
# server 端外网口kni抓包
10:36:48.854348 IP (tos 0x0, ttl 62, id 21851, offset 0, flags [DF], proto TCP (6), length 40)
c_ip.20000 > vip.80: Flags [R], cksum 0xfdf1 (correct), seq 695027805, win 0, length 0
10:36:48.896353 IP (tos 0x0, ttl 62, id 21852, offset 0, flags [DF], proto TCP (6), length 40)
c_ip.20000 > vip.80: Flags [R], cksum 0x2f3c (correct), seq 1195112772, win 0, length 0
10:36:48.954346 IP (tos 0x0, ttl 62, id 21853, offset 0, flags [DF], proto TCP (6), length 40)
c_ip.20000 > vip.80: Flags [R], cksum 0xd931 (correct), seq 1623995838, win 0, length 0
10:36:48.996348 IP (tos 0x0, ttl 62, id 21854, offset 0, flags [DF], proto TCP (6), length 40)
c_ip.20000 > vip.80: Flags [R], cksum 0x6614 (correct), seq 1559669937, win 0, length 0
- kni 接口流量,kni另外只有BGP流量
defence_tcp_drop
打开的时候, 目标 IP 是 vip 但端口不是 vport 的包会直接丢弃掉。defence_tcp_drop
关闭会把这种包转发到 KNI。如果在非安全环境,建议打开这个配置。
从你给出的抓包截图上看,KNI 收到的包都是来自 client 的 RST 包,这有点像异常攻击。我用你上面给出的复现方法没有复现这个问题,KNI 接口流量只有 10kpps 左右。
defence_tcp_drop
打开的时候, 目标 IP 是 vip 但端口不是 vport 的包会直接丢弃掉。defence_tcp_drop
关闭会把这种包转发到 KNI。如果在非安全环境,建议打开这个配置。从你给出的抓包截图上看,KNI 收到的包都是来自 client 的 RST 包,这有点像异常攻击。我用你上面给出的复现方法没有复现这个问题,KNI 接口流量只有 10kpps 左右。
defence_tcp_drop
从1.7开始默认关闭这个配置不合适,建议改为打开
10kpps是正常的,调整-i
参数更改发包频率。
我模拟环境打了400kpss,也只是出现了kni_send2kern_loop
的异常,没有出现当时no memory
的报错。以下是当时场景的一些错误日志
2021-02-28T19:43:14+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:14+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:14+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:16+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:16+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:16+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:17+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:19+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_conn_new: no memory
2021-02-28T20:06:38+08:00 lvs warning dpvs[132777]: IPVS: dp_vs_synproxy_ack_rcv: ip_vs_schedule failed
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_conn_new: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_ack_rcv: ip_vs_schedule failed
2021-02-28T20:06:38+08:00 lvs warning dpvs[132777]: IPVS: dp_vs_synproxy_ack_rcv: ip_vs_schedule failed
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_conn_new: no memory
2021-02-28T20:06:38+08:00 lvs warning dpvs[132777]: IPVS: dp_vs_synproxy_ack_rcv: ip_vs_schedule failed
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_conn_new: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs warning dpvs[132777]: IPVS: dp_vs_synproxy_ack_rcv: ip_vs_schedule failed
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
这个场景是rs
没响应syn
包导致SYNPROXY_ACK_MBUFPOOL
消耗完了?
defence_tcp_drop关闭,后端服务异常,丢包,重传不响应syn
包建连的时候容易把ack_mbufpool
打满,导致dpvs
异常无法处理。上图是为了测试故意把ack_mbufpool
默认值100万改为了2万,用一个客户端wrk
打流量