dpvs icon indicating copy to clipboard operation
dpvs copied to clipboard

defence_tcp_drop关闭的情况下vip的部分tcp流量到了kni接口

Open nomyself opened this issue 3 years ago • 3 comments

场景描述

  • 双臂fullnat模式
  • 普通的http服务,客户端异常时候会产生大量类似syn flood的包,部分包全部透传到kni接口,大概300k pps。导致BGP中断,健康检查出现中断的现象。从代码看到vip:vport的流量不应该走到kni接口才对

复现方法

  • 关键是dpvs.conf关闭defence_tcp_drop,测试1.6.11.8.4都可以复现。1.7版本开始默认关闭,这个集群使用的是1.8.4版本
  • vip 开启synproxy
  • 客户端发包
# -c 指定次数, -S  flag-syn -p 端口 -i 发包频率
hping3 -V -c 10000  -S -p 80 -i u100000 vip
  • 现象,包的时间戳忽略,不对应。
# client 发出的包
11:18:19.768999 IP (tos 0x0, ttl 64, id 37355, offset 0, flags [none], proto TCP (6), length 40)
    client.2254 > vip.80: Flags [S], cksum 0x2582 (correct), seq 554031943, win 512, length 0
11:18:19.769021 IP (tos 0x0, ttl 62, id 37355, offset 0, flags [none], proto TCP (6), length 40)
    vip.80 > client.2254: Flags [S.], cksum 0xe406 (correct), seq 3530883126, ack 554031944, win 512, length 0
11:18:19.769030 IP (tos 0x0, ttl 64, id 39383, offset 0, flags [DF], proto TCP (6), length 40)
    client.2254 > vip.80: Flags [R], cksum 0xb8c0 (correct), seq 554031944, win 0, length 0
11:18:19.869014 IP (tos 0x0, ttl 64, id 18676, offset 0, flags [none], proto TCP (6), length 40)
    client.2255 > vip.80: Flags [S], cksum 0x3043 (correct), seq 2130215353, win 512, length 0
11:18:19.869033 IP (tos 0x0, ttl 62, id 18676, offset 0, flags [none], proto TCP (6), length 40)
    vip.80 > client.2255: Flags [S.], cksum 0x8e76 (correct), seq 673053624, ack 2130215354, win 512, length 0
11:18:19.869042 IP (tos 0x0, ttl 64, id 39384, offset 0, flags [DF], proto TCP (6), length 40)
    client.2255 > vip.80: Flags [R], cksum 0xb45a (correct), seq 2130215354, win 0, length 0
11:18:19.969021 IP (tos 0x0, ttl 64, id 52270, offset 0, flags [none], proto TCP (6), length 40)
    client.2256 > vip.80: Flags [S], cksum 0x555c (correct), seq 943043500, win 512, length 0
11:18:19.969040 IP (tos 0x0, ttl 62, id 52270, offset 0, flags [none], proto TCP (6), length 40)
    vip.80 > client.2256: Flags [S.], cksum 0xbc86 (correct), seq 2676517644, ack 943043501, win 512, length 0
11:18:19.969045 IP (tos 0x0, ttl 64, id 39385, offset 0, flags [DF], proto TCP (6), length 40)
    client.2256 > vip.80: Flags [R], cksum 0xc929 (correct), seq 943043501, win 0, length 0
11:18:20.069035 IP (tos 0x0, ttl 64, id 17297, offset 0, flags [none], proto TCP (6), length 40)
    client.2257 > vip.80: Flags [S], cksum 0x3513 (correct), seq 483835533, win 512, length 0
11:18:20.069054 IP (tos 0x0, ttl 62, id 17297, offset 0, flags [none], proto TCP (6), length 40)
    vip.80 > client.2257: Flags [S.], cksum 0xd814 (correct), seq 969786806, ack 483835534, win 512, length 0
11:18:20.069059 IP (tos 0x0, ttl 64, id 39386, offset 0, flags [DF], proto TCP (6), length 40)
    client.2257 > vip.80: Flags [R], cksum 0xd9a6 (correct), seq 483835534, win 0, length 0

# server 端外网口kni抓包
10:36:48.854348 IP (tos 0x0, ttl 62, id 21851, offset 0, flags [DF], proto TCP (6), length 40)
    c_ip.20000 > vip.80: Flags [R], cksum 0xfdf1 (correct), seq 695027805, win 0, length 0
10:36:48.896353 IP (tos 0x0, ttl 62, id 21852, offset 0, flags [DF], proto TCP (6), length 40)
    c_ip.20000 > vip.80: Flags [R], cksum 0x2f3c (correct), seq 1195112772, win 0, length 0
10:36:48.954346 IP (tos 0x0, ttl 62, id 21853, offset 0, flags [DF], proto TCP (6), length 40)
    c_ip.20000 > vip.80: Flags [R], cksum 0xd931 (correct), seq 1623995838, win 0, length 0
10:36:48.996348 IP (tos 0x0, ttl 62, id 21854, offset 0, flags [DF], proto TCP (6), length 40)
    c_ip.20000 > vip.80: Flags [R], cksum 0x6614 (correct), seq 1559669937, win 0, length 0
  • kni 接口流量,kni另外只有BGP流量 image

nomyself avatar Mar 09 '21 03:03 nomyself

defence_tcp_drop 打开的时候, 目标 IP 是 vip 但端口不是 vport 的包会直接丢弃掉。defence_tcp_drop 关闭会把这种包转发到 KNI。如果在非安全环境,建议打开这个配置。

从你给出的抓包截图上看,KNI 收到的包都是来自 client 的 RST 包,这有点像异常攻击。我用你上面给出的复现方法没有复现这个问题,KNI 接口流量只有 10kpps 左右。

ywc689 avatar Mar 10 '21 09:03 ywc689

defence_tcp_drop 打开的时候, 目标 IP 是 vip 但端口不是 vport 的包会直接丢弃掉。defence_tcp_drop 关闭会把这种包转发到 KNI。如果在非安全环境,建议打开这个配置。

从你给出的抓包截图上看,KNI 收到的包都是来自 client 的 RST 包,这有点像异常攻击。我用你上面给出的复现方法没有复现这个问题,KNI 接口流量只有 10kpps 左右。

defence_tcp_drop 从1.7开始默认关闭这个配置不合适,建议改为打开 10kpps是正常的,调整-i参数更改发包频率。

我模拟环境打了400kpss,也只是出现了kni_send2kern_loop的异常,没有出现当时no memory的报错。以下是当时场景的一些错误日志

2021-02-28T19:43:14+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:14+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:14+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:16+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:16+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:16+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:17+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:18+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T19:43:19+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_synack_rcv: got ack_mbuf NULL pointer: ack-saved = 0
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_conn_new: no memory
2021-02-28T20:06:38+08:00 lvs warning dpvs[132777]: IPVS: dp_vs_synproxy_ack_rcv: ip_vs_schedule failed
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_conn_new: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_ack_rcv: ip_vs_schedule failed
2021-02-28T20:06:38+08:00 lvs warning dpvs[132777]: IPVS: dp_vs_synproxy_ack_rcv: ip_vs_schedule failed
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_conn_new: no memory
2021-02-28T20:06:38+08:00 lvs warning dpvs[132777]: IPVS: dp_vs_synproxy_ack_rcv: ip_vs_schedule failed
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_conn_new: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs warning dpvs[132777]: IPVS: dp_vs_synproxy_ack_rcv: ip_vs_schedule failed
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory
2021-02-28T20:06:38+08:00 lvs err dpvs[132777]: IPVS: dp_vs_synproxy_filter_ack: no memory

这个场景是rs没响应syn包导致SYNPROXY_ACK_MBUFPOOL消耗完了?

nomyself avatar Mar 11 '21 02:03 nomyself

image

defence_tcp_drop关闭,后端服务异常,丢包,重传不响应syn包建连的时候容易把ack_mbufpool打满,导致dpvs异常无法处理。上图是为了测试故意把ack_mbufpool默认值100万改为了2万,用一个客户端wrk打流量

nomyself avatar Mar 11 '21 05:03 nomyself