frr icon indicating copy to clipboard operation
frr copied to clipboard

GR fails under large route

Open wangdan1323 opened this issue 1 year ago • 6 comments

Kernel: Linux 4.19.0-12-2-amd64 FRR Version: stable/8.5 GR fails under large route.

The reason is bpacket_queue_is_full, causing some routes to be sent to GR Helper after End-of-Rib. Why is subgroup-pkt-queue-max set to 40 by default and supported to be configurable? What is the impact of the default setting to the maximum value of 100? Can the default value be set to exceed 100?

wangdan1323 avatar Jun 01 '24 08:06 wangdan1323

We need more details on what situation you hit this limit. Can you share the configuration, scope, and maybe some logs? Also... show event cpu output when this limit is reached.

ton31337 avatar Jun 03 '24 19:06 ton31337

bgp92----------bgp62-----------bgp51 bgp92 has 100k routes to bgp62. When 62 bgp restart, 62 will receive 100k routes from 62. Then 62 sends these 100k routes to its neighbors. DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.124/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.139/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.125/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.140/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.126/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.127/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.141/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.128/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.129/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.142/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.130/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.143/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE len 4096 numpfx 806 806 * subgroup-pkt-queue-max (default = 40) = 32240 when the routes number is 100k, this limit is reached.

wangdan1323 avatar Jun 04 '24 06:06 wangdan1323

Can you show show ip bgp update-groups also when doing a restart?

ton31337 avatar Jun 04 '24 06:06 ton31337

Here is 37k routes: show ip bgp update-groups Update-group 3: Created: Tue Jun 4 15:57:48 2024 Outgoing route map: wd1 MRAI value (seconds): 0

Update-subgroup 3: Created: Tue Jun 4 15:57:48 2024 Join events: 1 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 0 Merge checks triggered: 2 Coalesce Time: 1350 Version: 76000 Packet queue length: 0 Total packets enqueued: 0 Packet queue high watermark: 0 Adj-out list count: 0 Advertise list: empty Flags: Peers: - 210.6.1.92 Update-group 4: Created: Tue Jun 4 15:57:48 2024 MRAI value (seconds): 0

Update-subgroup 4: Created: Tue Jun 4 15:57:48 2024 Join events: 2 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 1 Merge checks triggered: 0 Coalesce Time: 1350 Version: 76000 Packet queue length: 0 Total packets enqueued: 138 Packet queue high watermark: 48 Adj-out list count: 38000 Advertise list: empty Flags: Peers: - 210.2.1.179 - 210.4.1.51

wangdan1323 avatar Jun 04 '24 08:06 wangdan1323

Packet queue length is 0. Doesn't seem to be triggered in that case where you said at the beginning. Btw, anything changes if you change from 40 to 100?

ton31337 avatar Jun 04 '24 08:06 ton31337

1 bgp default subgroup-pkt-queue-max 90 updating: Update-group 5: Created: Tue Jun 4 16:54:32 2024 MRAI value (seconds): 0

Update-subgroup 9: Created: Tue Jun 4 16:54:32 2024 Join events: 2 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 0 Merge checks triggered: 0 Coalesce Time: 1350 Version: 156000 Packet queue length: 39 Total packets enqueued: 41 Packet queue high watermark: 39 Adj-out list count: 38000 Advertise list: not empty Flags: Peers: - 210.4.1.51 - 210.2.1.179

End of update:

Update-group 5: Created: Tue Jun 4 16:54:32 2024 MRAI value (seconds): 0

Update-subgroup 9: Created: Tue Jun 4 16:54:32 2024 Join events: 2 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 0 Merge checks triggered: 0 Coalesce Time: 1350 Version: 156000 Packet queue length: 0 Total packets enqueued: 48 Packet queue high watermark: 39 Adj-out list count: 38000 Advertise list: empty Flags: Peers: - 210.4.1.51 - 210.2.1.179

2 bgp default subgroup-pkt-queue-max 30 updaing

Update-group 9: Created: Tue Jun 4 16:59:00 2024 MRAI value (seconds): 0

Update-subgroup 13: Created: Tue Jun 4 16:59:00 2024 Join events: 2 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 0 Merge checks triggered: 1 Coalesce Time: 1350 Version: 270000 Packet queue length: 0 Total packets enqueued: 2 Packet queue high watermark: 2 Adj-out list count: 38000 Advertise list: not empty Flags: Peers: - 210.4.1.51 - 210.2.1.179

End of update: Update-group 9: Created: Tue Jun 4 16:59:00 2024 MRAI value (seconds): 0

Update-subgroup 13: Created: Tue Jun 4 16:59:00 2024 Join events: 2 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 0 Merge checks triggered: 1 Coalesce Time: 1350 Version: 270000 Packet queue length: 0 Total packets enqueued: 48 Packet queue high watermark: 29 Adj-out list count: 38000 Advertise list: empty Flags: Peers: - 210.4.1.51 - 210.2.1.179

wangdan1323 avatar Jun 04 '24 09:06 wangdan1323

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

github-actions[bot] avatar Dec 02 '24 02:12 github-actions[bot]

This issue will be automatically closed in the specified period unless there is further activity.

frrbot[bot] avatar Dec 02 '24 02:12 frrbot[bot]