gatekeeper icon indicating copy to clipboard operation
gatekeeper copied to clipboard

/8 prefix fib routes test,gatekeeper caught SIGTERM

Open ShawnLeung87 opened this issue 1 year ago • 27 comments

I simulated /8 prefix fib routes, the number of prefixes was 224, and then launched a syn+ack attack,Use a random source “--rand-source ”. After more than 40 minutes, the GK server appeared a caught SIGTERM. gatekeeper version 1.0.0

ShawnLeung87 avatar Jul 27 '22 08:07 ShawnLeung87

Does doubling the number of these routes make it easier to reproduce the problem? Does the problem happen with half of these routes? What are the kinds of these routes? Are they all of the type GK_FWD_GRANTOR?

There is not enough context information to formulate hypotheses to test to figure the problem out eventually. So it would help if you described your testbed. Doesn't Gatekeeper's log have any clue?

Repeating your experiment with Gatekeeper running under gdb would likely produce valuable context. Once the SIGTERM is triggered, gdb will give you some information about the failure and the opportunity to enter the command bt. The output of the command bt likely will include a clue of what's happening.

AltraMayor avatar Jul 27 '22 13:07 AltraMayor

The Gk log doesn't have any clues, just see that the GK server is restarted.

  1. Simulate the FIB route prefix imported by the ISP, the prefix is a class A address, and a total of 224 class A addresses are simulated.
  2. The GK_FWD_GRANTOR forwarding entry is 30.30.30.0/24,
  3. ddos attack, use random source ip to attack the destination address 30.30.30.30/32, tcp flags syn+ack (-SA mode). Each time a 64-packet request is sent, a source ip is replaced
  4. Test the attack software hping3

ShawnLeung87 avatar Jul 28 '22 02:07 ShawnLeung87

Is it because the mask is too large, lcore calculates the flow table. After about 43 minutes of the simulated attack, the GK server restarts

ShawnLeung87 avatar Jul 28 '22 02:07 ShawnLeung87

GK flow table entry is set so large flow_ht_size = 250000000 Is it because the class A address will exceed the total number of entries in the flow table ?

ShawnLeung87 avatar Jul 28 '22 02:07 ShawnLeung87

GK flow table entry is set so large flow_ht_size = 250000000 Is it because the class A address will exceed the total number of entries in the flow table ?

How much RAM does the Gatekeeper server have? How much of the RAM is allocated to huge pages?

AltraMayor avatar Jul 28 '22 10:07 AltraMayor

Is it because the mask is too large, lcore calculates the flow table. After about 43 minutes of the simulated attack, the GK server restarts

Post your parameters in 'lua/net.lua`.

AltraMayor avatar Jul 28 '22 10:07 AltraMayor

The Gk log doesn't have any clues, just see that the GK server is restarted.

  1. Simulate the FIB route prefix imported by the ISP, the prefix is a class A address, and a total of 224 class A addresses are simulated.
  2. The GK_FWD_GRANTOR forwarding entry is 30.30.30.0/24,
  3. ddos attack, use random source ip to attack the destination address 30.30.30.30/32, tcp flags syn+ack (-SA mode). Each time a 64-packet request is sent, a source ip is replaced
  4. Test the attack software hping3

This setup is fine. Please run Gatekeeper under gdb as instructed before.

AltraMayor avatar Jul 28 '22 10:07 AltraMayor

1 minute after the following log appears, GK server restart.We do not have any operations on the flow on the GK server. Just follow the DDOS attack scenario of this issue to test 1.Simulate the FIB route prefix imported by the ISP, the prefix is a class A address, and a total of 224 class A addresses are simulated. 2.The GK_FWD_GRANTOR forwarding entry is 30.30.30.0/24, 3.ddos attack, use random source ip to attack the destination address 30.30.30.30/32, tcp flags syn+ack (-SA mode). Each time a 64-packet request is sent, a source ip is replaced 4.Test the attack software hping3

[2022-08-16 10:16:58] GATEKEEPER: Flow (src: 181.62.222.216, dst: 30.30.30.216) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x1baf20f8, expire_at: 0xa91f223f21f, last_packet_seen_at: 0xa8f493f3574, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 168.218.231.110, dst: 30.30.30.219) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 168.218.231.110, dst: 30.30.30.219) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xc9f79aac, expire_at: 0xa91f226930b, last_packet_seen_at: 0xa8f4941d662, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 117.208.248.214, dst: 30.30.30.78) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 117.208.248.214, dst: 30.30.30.78) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x3abf01f8, expire_at: 0xa91f2271d35, last_packet_seen_at: 0xa8f4942608a, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 73.181.6.93, dst: 30.30.30.161) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 73.181.6.93, dst: 30.30.30.161) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xead19c74, expire_at: 0xa91f22b6947, last_packet_seen_at: 0xa8f4946ac9c, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 209.207.91.168, dst: 30.30.30.125) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 209.207.91.168, dst: 30.30.30.125) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x25eb2f64, expire_at: 0xa91f22c4583, last_packet_seen_at: 0xa8f494788da, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 108.161.214.124, dst: 30.30.30.26) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 108.161.214.124, dst: 30.30.30.26) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x64925f14, expire_at: 0xa91f22cd2ef, last_packet_seen_at: 0xa8f49481644, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 18.92.110.36, dst: 30.30.30.215) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 18.92.110.36, dst: 30.30.30.215) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x55a90d6c, expire_at: 0xa91f22e59bf, last_packet_seen_at: 0xa8f49499d16, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 25.204.128.30, dst: 30.30.30.106) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 25.204.128.30, dst: 30.30.30.106) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x634eb000, expire_at: 0xa91f22f981d, last_packet_seen_at: 0xa8f494adb74, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 80.24.27.39, dst: 30.30.30.108) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 80.24.27.39, dst: 30.30.30.108) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x882acbf4, expire_at: 0xa91f23021f7, last_packet_seen_at: 0xa8f494b654c, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 61.219.12.131, dst: 30.30.30.229) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 61.219.12.131, dst: 30.30.30.229) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x84dca0f0, expire_at: 0xa91f230abc1, last_packet_seen_at: 0xa8f494bef12, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 25.108.248.27, dst: 30.30.30.214) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 25.108.248.27, dst: 30.30.30.214) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x3eafe80c, expire_at: 0xa91f2313c79, last_packet_seen_at: 0xa8f494c7fd0, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 23.92.208.105, dst: 30.30.30.11) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 23.92.208.105, dst: 30.30.30.11) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xd2ec7f44, expire_at: 0xa91f231d357, last_packet_seen_at: 0xa8f494d176e, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 84.251.228.93, dst: 30.30.30.143) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 84.251.228.93, dst: 30.30.30.143) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xb698c09c, expire_at: 0xa91f232b949, last_packet_seen_at: 0xa8f494dfc9e, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 27.155.110.170, dst: 30.30.30.168) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 27.155.110.170, dst: 30.30.30.168) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x970c6fe0, expire_at: 0xa91f2338ea7, last_packet_seen_at: 0xa8f494ed1fe, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 130.43.11.199, dst: 30.30.30.68) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 130.43.11.199, dst: 30.30.30.68) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xeb167e88, expire_at: 0xa91f233d731, last_packet_seen_at: 0xa8f494f1a8a, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 161.43.224.184, dst: 30.30.30.91) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 161.43.224.184, dst: 30.30.30.91) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xdc35a3bc, expire_at: 0xa91f234ea8d, last_packet_seen_at: 0xa8f49502de0, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 171.17.171.197, dst: 30.30.30.73) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 171.17.171.197, dst: 30.30.30.73) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x76ea13f4, expire_at: 0xa91f23651a9, last_packet_seen_at: 0xa8f495194fe, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 85.189.81.75, dst: 30.30.30.73) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 85.189.81.75, dst: 30.30.30.73) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xf0604f5c, expire_at: 0xa91f2392725, last_packet_seen_at: 0xa8f49546a7a, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 122.232.37.189, dst: 30.30.30.36) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 122.232.37.189, dst: 30.30.30.36) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x4e32e230, expire_at: 0xa91f239687d, last_packet_seen_at: 0xa8f4954abd2, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 220.39.2.199, dst: 30.30.30.249) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 220.39.2.199, dst: 30.30.30.249) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xe8d6ae8c, expire_at: 0xa91f23c4e91, last_packet_seen_at: 0xa8f495791ec, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 136.114.45.219, dst: 30.30.30.212) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 136.114.45.219, dst: 30.30.30.212) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x6ddaac38, expire_at: 0xa91f23db01f, last_packet_seen_at: 0xa8f4958f378, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 205.97.131.92, dst: 30.30.30.182) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 205.97.131.92, dst: 30.30.30.182) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x53ce5b3c, expire_at: 0xa91f23e42df, last_packet_seen_at: 0xa8f49598636, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 198.44.62.87, dst: 30.30.30.117) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 198.44.62.87, dst: 30.30.30.117) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xab10b2ac, expire_at: 0xa91f2426199, last_packet_seen_at: 0xa8f495da4f2, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 10.77.240.209, dst: 30.30.30.217) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 10.77.240.209, dst: 30.30.30.217) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x4feca0c4, expire_at: 0xa91f2438f2d, last_packet_seen_at: 0xa8f495ed284, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 141.23.180.80, dst: 30.30.30.136) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 141.23.180.80, dst: 30.30.30.136) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xa2dd4434, expire_at: 0xa91f2455757, last_packet_seen_at: 0xa8f49609aae, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 34.61.237.170, dst: 30.30.30.94) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 34.61.237.170, dst: 30.30.30.94) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xec044b9c, expire_at: 0xa91f2459ec5, last_packet_seen_at: 0xa8f4960e21a, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 16.75.37.91, dst: 30.30.30.131) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 16.75.37.91, dst: 30.30.30.131) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xeb7bd028, expire_at: 0xa91f245ee37, last_packet_seen_at: 0xa8f4961318c, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 37.85.199.189, dst: 30.30.30.34) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 37.85.199.189, dst: 30.30.30.34) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xf29badcc, expire_at: 0xa91f2462c1b, last_packet_seen_at: 0xa8f4961704e, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 44.124.53.29, dst: 30.30.30.75) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 44.124.53.29, dst: 30.30.30.75) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x590c6cd0, expire_at: 0xa91f2467497, last_packet_seen_at: 0xa8f4961b7ea, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 184.110.54.45, dst: 30.30.30.0) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 184.110.54.45, dst: 30.30.30.0) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xf70f638c, expire_at: 0xa91f246fe53, last_packet_seen_at: 0xa8f496241a6, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 214.30.86.147, dst: 30.30.30.203) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 214.30.86.147, dst: 30.30.30.203) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x121e4090, expire_at: 0xa91f24756f3, last_packet_seen_at: 0xa8f49629a48, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 90.180.180.3, dst: 30.30.30.67) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 90.180.180.3, dst: 30.30.30.67) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x37265db4, expire_at: 0xa91f247f197, last_packet_seen_at: 0xa8f496334ee, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 220.6.67.139, dst: 30.30.30.73) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 220.6.67.139, dst: 30.30.30.73) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xae4ec750, expire_at: 0xa91f24b1699, last_packet_seen_at: 0xa8f496659f0, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 170.110.56.86, dst: 30.30.30.125) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 170.110.56.86, dst: 30.30.30.125) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x8967d1f8, expire_at: 0xa91f24b5eb5, last_packet_seen_at: 0xa8f4966a206, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 128.17.104.31, dst: 30.30.30.164) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 128.17.104.31, dst: 30.30.30.164) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x9781a624, expire_at: 0xa91f24cdfab, last_packet_seen_at: 0xa8f49682300, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 118.10.27.10, dst: 30.30.30.136) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 118.10.27.10, dst: 30.30.30.136) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x983464d0, expire_at: 0xa91f24ee2e9, last_packet_seen_at: 0xa8f496a2640, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 113.229.80.195, dst: 30.30.30.113) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 113.229.80.195, dst: 30.30.30.113) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0xa0a0d678, expire_at: 0xa91f25027fd, last_packet_seen_at: 0xa8f496b6b56, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 197.255.64.124, dst: 30.30.30.167) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 197.255.64.124, dst: 30.30.30.167) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x45f3d77c, expire_at: 0xa91f25069db, last_packet_seen_at: 0xa8f496bad2e, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 83.97.229.143, dst: 30.30.30.155) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 83.97.229.143, dst: 30.30.30.155) at lcore 4: [state: GK_REQUEST (0), flow_hash_value: 0x490c5508, expire_at: 0xa91f251d9d5, last_packet_seen_at: 0xa8f496d1d2c, last_priority: 38, allowance: 6, grantor_ip: 10.0.2.11] [2022-08-16 10:16:58] GATEKEEPER: Flow (src: 94.199.199.11, dst: 30.30.30.65) at lcore 4: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow...

ShawnLeung87 avatar Aug 16 '22 02:08 ShawnLeung87

The 1.0.0 master downloaded on May 28 has this method gk_del_flow_entry_at_pos, I installed this version of gk server, This stable version of 1.0.0 does not have gk_del_flow_entry_at_pos. It should be the merged code that caused the exception. I will redeploy the stable version of 1.0.0 for testing

ShawnLeung87 avatar Aug 16 '22 02:08 ShawnLeung87

Using stable version 1.0.0, the same DDOS attack scenario, the following error occurs, but the GK server does not restart. [2022-08-16 16:15:05] GATEKEEPER: flow: The GK block failed to delete a key from hash table at gk_del_flow_entry_from_hash: No such file or directory for the flow with IP source address 58.77.92.151, and destination address 30.30.30.132

ShawnLeung87 avatar Aug 16 '22 08:08 ShawnLeung87

Hi @ShawnLeung87,

While the new information is useful, please answer my previous requests for information, so I can formulate hypotheses to go about this issue.

Given that Gatekeeper runs in user space, no spontaneous reboot should happen. Since the server is rebooting by itself, a hardware failure may be occurring here. Does your Gatekeeper server have ECC RAM? Could you look for general failures in the kernel log file /var/log/dmesg? The open source MemTest86 can help you test the RAM of your server.

How are you adding date and time to the log entries?

You say that the address 30.30.30.30/32 is the only target of the attack, but the log entries report other destinations: 30.30.30.113, 30.30.30.216, 30.30.30.219, etc. Can you explain this inconsistency in your report?

All logged flows you reported are at state GK_REQUEST. While this may not be related to the reboot, it may imply that your Gatekeeper does not have access to your Grantor server. Can you ping the destination without the attack going on, and log the corresponding flow? If this flow is still GK_REQUEST, your Gatekeeper server is not reaching the Grantor server, or your Grantor server is not reaching back to your Gatekeeper server.

It's recommended to use Gatekeeper 1.0.0 with the patch gk: heal corrupted flow tables #531 in production.

AltraMayor avatar Aug 16 '22 17:08 AltraMayor

1、When the GK server is abnormal and reloaded, there is no hardware exception information in /var/log/dmesg 2、Using the 1.0.0 stable version gatekeeper without gk_del_flow_entry_at_pos, there is no reload, but there is a log of flow table corruption“[2022-08-16 16:15:05] GATEKEEPER: flow: The GK block failed to delete a key from hash table at gk_del_flow_entry_from_hash: No such file or directory for the flow with IP source address 58.77.92.151, and destination address 30.30.30.132” 3、I simulate an attack on ddos (SYN+ACK) a C segment ip, the source address is also random, 30.30.30.x is a random destination address, in this scenario, the gk flow table exception is most likely to appear 4、"gk: healed flow tables #531" I haven't updated this patch, I will test it again after I update it to see if there is still flow abnormality

ShawnLeung87 avatar Aug 17 '22 02:08 ShawnLeung87

If reload occurs in GK, kni will be re-initialized (this process cannot communicate with gk and gt servers), but after reload is completed, bird will be interrupted, and bird will not be started automatically. This exception is very serious. According to the explanation in #531, flow abnormally damages too many logs (when this log occurs, gk and gt can still be accessed normally), it will eventually cause gk to hang, and gatekeeper needs to be restarted to solve it.

ShawnLeung87 avatar Aug 17 '22 02:08 ShawnLeung87

1、When the GK server is abnormal and reloaded, there is no hardware exception information in /var/log/dmesg

Check the output of the command dmesg after the server has logged entries related to failing to remove flow entries. The latest output of the command dmesg may not have been saved in /var/log/dmesg.

Does your Gatekeeper server have ECC RAM? Please don't skip my questions. Skipping my questions slows down the diagnosis of your problem.

Let's eliminate the possibility of RAM fault here, test the RAM of your server with MemTest86.

2、Using the 1.0.0 stable version gatekeeper without gk_del_flow_entry_at_pos, there is no reload, but there is a log of flow table corruption“[2022-08-16 16:15:05] GATEKEEPER: flow: The GK block failed to delete a key from hash table at gk_del_flow_entry_from_hash: No such file or directory for the flow with IP source address 58.77.92.151, and destination address 30.30.30.132”

While the log entry is different, they refer to the same problem: the flow table is being corrupted. The Gatekeeper without gk_del_flow_entry_at_pos() detects the problem, but doesn't try to fix it. Whereas the version with gk_del_flow_entry_at_pos() tries to fix the flow table. The flow table could be corrupted due to a bug, or hardware failure.

You say here that "there is no reload", but, in another post, you wrote "GK server restart". Is your server rebooting, that is, the Linux kernel is restarting? Or is only the Gatekeeper daemon being reloaded?

3、I simulate an attack on ddos (SYN+ACK) a C segment ip, the source address is also random, 30.30.30.x is a random destination address, in this scenario, the gk flow table exception is most likely to appear

This clarifies the log entries you posted before.

Since that randomizing source and destination addresses of the attack packets helps to trigger the problem, it suggests that the problem is related to an overpopulated flow table. Test varying the size of the flow table. What happens with half the size you have now? What about if you double the size you have now?

The size of your current flow table is "flow_ht_size = 250000000", correct? How much RAM does your Gatekeeper server have? How much of the RAM is allocated to huge pages?

If reload occurs in GK, kni will be re-initialized (this process cannot communicate with gk and gt servers), but after reload is completed, bird will be interrupted, and bird will not be started automatically. This exception is very serious. According to the explanation in #531, flow abnormally damages too many logs (when this log occurs, gk and gt can still be accessed normally), it will eventually cause gk to hang, and gatekeeper needs to be restarted to solve it.

Thank you for this information. Let's ignore bird for now since this is a side-effect of the problem we are investigating here.

AltraMayor avatar Aug 17 '22 12:08 AltraMayor

1、server configuration CPU:Intel(R) Xeon(R) Platinum 8280M CPU @ 2.60GHz cpu cores 28 ,numa 2 RAM :DDR4 16G x 24 (384G) ECC Samsung M393A2G40DB0-CPB
2、The restart mentioned here is not the restart of the server hardware, but the restart of the gatekeeper application itself When the gatekeeper flow table exception error occurs, /var/log/dmesg does not see any hardware exception message 3、GK Related flow table configuration information local flow_ht_size = 250000000 local flow_table_scan_iter = 0 local scan_del_thresh = flow_ht_size * 0.1

    local max_num_ipv4_rules = 1600000
    local num_ipv4_tbl8s = 1000
    local max_num_ipv6_rules = 200000
    local num_ipv6_tbl8s = 65536
    local max_num_ipv6_neighbors = 65536

ShawnLeung87 avatar Aug 18 '22 02:08 ShawnLeung87

Catch this error 5 minutes before the flow exception occurs use command :dmesg -T -w [Thu Aug 18 11:10:22 2022] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750 And when the flow exception occurs, the GK server pings the GT server, and there is packet loss. image

ShawnLeung87 avatar Aug 18 '22 03:08 ShawnLeung87

The flow is abnormal, and the following error occurs before the gk server is automatically re-initialized [2022-08-18 11:41:41] GATEKEEPER: flow: gk_del_flow_entry_with_key(): the flow entry does not correspond to the flow key at postion 128997909; logging entry and releasing both flow keys...; while trying to show flow data, an unknown flow type 0 was found [2022-08-18 11:41:41] GATEKEEPER: flow: NOT in use [state: GK_REQUEST (0), flow_hash_value: 0x0, expire_at: 0x0, last_packet_seen_at: 0x0, last_priority: 0, allowance: 0, grantor_ip: NULL FIB entry]; while trying to show flow data, an unknown flow type 0 was found PANIC in rss_ip_flow_hf(): Unexpected protocol: 0

ShawnLeung87 avatar Aug 18 '22 03:08 ShawnLeung87

SOL local req_bw_rate = 0.5 ,With this configuration, when the GK request traffic is larger than it, there will be serious packet loss, and the association between flow and bpf cannot be obtained. Is this a reason for the abnormal flow table?

ShawnLeung87 avatar Aug 18 '22 04:08 ShawnLeung87

Hi @ShawnLeung87,

I think I've found the root cause of this issue. Could you test pull request #596? In case you need it, here is how you can pull a pull request. Another way to get that code is to download the diff, and apply it with the command patch -p1 < 596.diff against your local copy of the new branch lts_v1.0.

If everything works fine with pull request #596 during your tests, I will merge this pull request and release version v1.0.1 since this fix is important for production.

AltraMayor avatar Aug 18 '22 21:08 AltraMayor

I'll be patching gatekeeper on Monday next week to continue testing the effect

ShawnLeung87 avatar Aug 20 '22 10:08 ShawnLeung87

64-byte small packet DDOS attack test report Test Pre_condition

1、DDOS syn+ack attack simulating random source 64-byte packets 2、Simulate 224 Class A addresses Routing fib table 3、hping3 -n -S -A -d 24 -p 80 --flood --rand-source --rand-dest -I eno2 30.30.30.x 4、GK server configuration CPU:Intel(R) Xeon(R) Platinum 8280M CPU @ 2.60GHz cpu cores 28 ,numa 2 RAM :DDR4 16G x 24 (384G) ECC Samsung M393A2G40DB0-CPB 5、Hupag default_hugepagesz=1G hugepagesz=1G hugepages=360 6、gatekeeper ,patch#596 scenarios 1

1、4 lcores gk 2、 gk.lua local mailbox_max_entries_exp = 14 local mailbox_mem_cache_size = 0 local mailbox_burst_size = 320 local log_ratelimit_interval_ms = 5000 local log_ratelimit_burst = 100 local max_pkt_burst_front = 320 local max_pkt_burst_back = 320

    local flow_ht_size = 250000000
    local flow_table_scan_iter = 0
    local scan_del_thresh = flow_ht_size * 0.1

    local max_num_ipv4_rules = 1600000
    local num_ipv4_tbl8s = 1000
    local max_num_ipv6_rules = 65536
    local num_ipv6_tbl8s = 4096
    local max_num_ipv6_neighbors = 65536

    local basic_measurement_logging_ms = 60 * 1000 -- (1 minute)

3 、main_config.lua local n_gk_lcores = 2 * staticlib.count_numa_nodes(numa_table) local n_sol_lcores_per_socket = 2

4 、sol.lua local req_bw_rate = 0.05 5、zabbix port traffic diagram The test started at 10:54. After 11:38, the gatekeeper reinitialized and an abnormal log appeared. image

6、error exception gatekeeper reinitialization,and the following error log appears:

[2022-08-22 11:38:25] GATEKEEPER: Flow (src: 59.109.120.5, dst: 30.30.30.193) at lcore 7: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow... [2022-08-22 11:38:26] GATEKEEPER: flow: gk_del_flow_entry_with_key(): the flow entry does not correspond to the flow key at postion 92610535; logging entry and releasing both flow keys...; while trying to show flow data, an unknown flow type 0 was found

scenarios 2

1、16 lcores gk 2、gk.lua local mailbox_max_entries_exp = 14 local mailbox_mem_cache_size = 0 local mailbox_burst_size = 320 local log_ratelimit_interval_ms = 5000 local log_ratelimit_burst = 100 local max_pkt_burst_front = 320 local max_pkt_burst_back = 320

    local flow_ht_size = 62500000
    local flow_table_scan_iter = 0
    local scan_del_thresh = flow_ht_size * 0.1

    local max_num_ipv4_rules = 400000
    local num_ipv4_tbl8s = 256
    local max_num_ipv6_rules = 65536
    local num_ipv6_tbl8s = 4096
    local max_num_ipv6_neighbors = 65536

    local basic_measurement_logging_ms = 60 * 1000 -- (1 minute)

    local front_icmp_msgs_per_sec = 1000
    local front_icmp_msgs_burst = 50
    local back_icmp_msgs_per_sec = 1000
    local back_icmp_msgs_burst = 50

3、main_config.lua local n_gk_lcores = 8 * staticlib.count_numa_nodes(numa_table) local n_sol_lcores_per_socket = 2

4、sol.lua local req_bw_rate = 0.05 5、zabbix port traffic diagram Start the test at 12:00, run for 5 hours, the gatekeeper does not appear to be re-initialized, and the abnormal log.It's still running image

6、error exception Only the following logs appear, this log level belongs to NOTICE in gatekeeper. [2022-08-22 15:38:40] GATEKEEPER LLS: front interface should not be seeing a packet with EtherType 0x0800 [2022-08-22 15:38:40] GATEKEEPER LLS: front interface should not be seeing a packet with EtherType 0x0800 [2022-08-22 15:38:40] GATEKEEPER LLS: front interface should not be seeing a packet with EtherType 0x0800 [2022-08-22 15:38:40] GATEKEEPER LLS: front interface should not be seeing a packet with EtherType 0x0800 [2022-08-22 15:38:40] GATEKEEPER LLS: front interface should not be seeing a packet with EtherType 0x0800

[2022-08-22 15:38:40] GATEKEEPER LLS: front interface should not be seeing a packet with EtherType 0x0800 [2022-08-22 15:38:40] GATEKEEPER LLS: front interface should not be seeing a packet with EtherType 0x0800 [2022-08-22 15:38:40] GATEKEEPER LLS: front interface should not be seeing a packet with EtherType 0x0800 [2022-08-22 15:38:40] GATEKEEPER LLS: front interface should not be seeing a packet with EtherType 0x0800 [2022-08-22 15:38:40] GATEKEEPER LLS: front interface should not be seeing a packet with EtherType 0x0800 Summary of test results Does this test indicate that in the case of a 64-byte packet attack scenario, the flow_ht_size of each lcore cannot be too large.

ShawnLeung87 avatar Aug 22 '22 08:08 ShawnLeung87

The following log entries from scenario 1 show that there was an issue while applying the code of pull request #596:

[2022-08-22 11:38:25] GATEKEEPER: Flow (src: 59.109.120.5, dst: 30.30.30.193) at lcore 7: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow...
[2022-08-22 11:38:26] GATEKEEPER: flow: gk_del_flow_entry_with_key(): the flow entry does not correspond to the flow key at postion 92610535; logging entry and releasing both flow keys...; while trying to show flow data, an unknown flow type 0 was found

As you check in file lib/flow.c of pull request #596, there should be no log entries terminating with "; while trying to show flow data, an unknown flow type %hu was found" since they are replaced with a call to the new function print_invalid_flow_err_msg() that includes more information. Perhaps, the pull request was applied correctly, but the older binary was somehow used during the tests.

AltraMayor avatar Aug 22 '22 13:08 AltraMayor

Side notes:

  1. The GK parameters local max_pkt_burst_front = 320 and local max_pkt_burst_back = 320 used in both scenarios may add latency in production. I advise testing the need for this change. If your tests do not confirm significant gains, I'd return to the default values.

  2. The parameter main_config.lua:local n_sol_lcores_per_socket = 2 is likely just wasting resources in scenario 1.

  3. While log entries like [2022-08-22 15:38:40] GATEKEEPER LLS: front interface should not be seeing a packet with EtherType 0x0800 are just notices and not related to the problem of this thread, later, you may want to investigate what is triggering it in your environment.

AltraMayor avatar Aug 22 '22 13:08 AltraMayor

The following log entries from scenario 1 show that there was an issue while applying the code of pull request #596:

[2022-08-22 11:38:25] GATEKEEPER: Flow (src: 59.109.120.5, dst: 30.30.30.193) at lcore 7: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow...
[2022-08-22 11:38:26] GATEKEEPER: flow: gk_del_flow_entry_with_key(): the flow entry does not correspond to the flow key at postion 92610535; logging entry and releasing both flow keys...; while trying to show flow data, an unknown flow type 0 was found

As you check in file lib/flow.c of pull request #596, there should be no log entries terminating with "; while trying to show flow data, an unknown flow type %hu was found" since they are replaced with a call to the new function print_invalid_flow_err_msg() that includes more information. Perhaps, the pull request was applied correctly, but the older binary was somehow used during the tests.

When I recompile I get the same warning message image

ShawnLeung87 avatar Aug 23 '22 02:08 ShawnLeung87

The following log entries from scenario 1 show that there was an issue while applying the code of pull request #596:

[2022-08-22 11:38:25] GATEKEEPER: Flow (src: 59.109.120.5, dst: 30.30.30.193) at lcore 7: gk_del_flow_entry_at_pos(): flow was not indexed; logging and dropping flow...
[2022-08-22 11:38:26] GATEKEEPER: flow: gk_del_flow_entry_with_key(): the flow entry does not correspond to the flow key at postion 92610535; logging entry and releasing both flow keys...; while trying to show flow data, an unknown flow type 0 was found

As you check in file lib/flow.c of pull request #596, there should be no log entries terminating with "; while trying to show flow data, an unknown flow type %hu was found" since they are replaced with a call to the new function print_invalid_flow_err_msg() that includes more information. Perhaps, the pull request was applied correctly, but the older binary was somehow used during the tests.

When I recompile I get the same warning message image

This is because of the problem of pulling in the diff mode. I manually pulled the code again and it can be used. I'll test it on a 4 lcores gk scenario and see if it works

ShawnLeung87 avatar Aug 23 '22 02:08 ShawnLeung87

The improvement of the code that heals the flow table has been forward ported to v1.1 and merged (see pull request #601).

Moreover, I've investigated the root cause of the corruption of the flow table, and it seems to be a bug in the custom DPDK used in Gatekeeper v1.0. Although I have not been able to pinpoint exactly where this bug is. Testing v1.1 with an environment similar to the one you used, there was no corruption of the flow table.

It'd be helpful if you could test v1.1 in your test environment. Notice that you can reinstall only your Gatekeeper server with version 1.1 and keep your Grantor servers at version 1.0; this should simplify your test.

AltraMayor avatar Sep 22 '22 20:09 AltraMayor

I have updated GK to the latest version 1.1 and GT using version 1.0 according to the above requirements, but now the data cannot be requested normally, the server port can be telnet and ping, and there is a problem with data transmission. 1、For example remote ssh, tcpdump sees data not received by server 截屏2022-10-15 下午5 56 20 2、flow bpf status [2022-10-15 18:44:19] GK/6: Flow (src: 20.20.20.20, dst: 30.30.30.30) at index 1: [state: GK_BPF (3), flow_hash_value: 0xd91eebd5, expire_at: 0x279fe7c5d632, program_index=0, cookie=03c68f3eed260000, 0004000000000000, 16fa0f0000000000, cc8734ab7b270000, 6801d4cf01000000, 0000000000000000, 0000000000000000, 0000000000000000, grantor_ip: 10.0.2.11] 3、Remove FWD_GRANTOR, it can be accessed normally

ShawnLeung87 avatar Oct 15 '22 10:10 ShawnLeung87