frr
frr copied to clipboard
GR fails under large route
Kernel: Linux 4.19.0-12-2-amd64 FRR Version: stable/8.5 GR fails under large route.
The reason is bpacket_queue_is_full, causing some routes to be sent to GR Helper after End-of-Rib. Why is subgroup-pkt-queue-max set to 40 by default and supported to be configurable? What is the impact of the default setting to the maximum value of 100? Can the default value be set to exceed 100?
We need more details on what situation you hit this limit. Can you share the configuration, scope, and maybe some logs? Also... show event cpu output when this limit is reached.
bgp92----------bgp62-----------bgp51 bgp92 has 100k routes to bgp62. When 62 bgp restart, 62 will receive 100k routes from 62. Then 62 sends these 100k routes to its neighbors. DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.124/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.139/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.125/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.140/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.126/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.127/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.141/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.128/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.129/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.142/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.130/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE 100.0.141.143/32 IPv4 unicast DEBUG bgp#bgpd[62]: u1:s1 send UPDATE len 4096 numpfx 806 806 * subgroup-pkt-queue-max (default = 40) = 32240 when the routes number is 100k, this limit is reached.
Can you show show ip bgp update-groups also when doing a restart?
Here is 37k routes: show ip bgp update-groups Update-group 3: Created: Tue Jun 4 15:57:48 2024 Outgoing route map: wd1 MRAI value (seconds): 0
Update-subgroup 3: Created: Tue Jun 4 15:57:48 2024 Join events: 1 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 0 Merge checks triggered: 2 Coalesce Time: 1350 Version: 76000 Packet queue length: 0 Total packets enqueued: 0 Packet queue high watermark: 0 Adj-out list count: 0 Advertise list: empty Flags: Peers: - 210.6.1.92 Update-group 4: Created: Tue Jun 4 15:57:48 2024 MRAI value (seconds): 0
Update-subgroup 4: Created: Tue Jun 4 15:57:48 2024 Join events: 2 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 1 Merge checks triggered: 0 Coalesce Time: 1350 Version: 76000 Packet queue length: 0 Total packets enqueued: 138 Packet queue high watermark: 48 Adj-out list count: 38000 Advertise list: empty Flags: Peers: - 210.2.1.179 - 210.4.1.51
Packet queue length is 0. Doesn't seem to be triggered in that case where you said at the beginning. Btw, anything changes if you change from 40 to 100?
1 bgp default subgroup-pkt-queue-max 90 updating: Update-group 5: Created: Tue Jun 4 16:54:32 2024 MRAI value (seconds): 0
Update-subgroup 9: Created: Tue Jun 4 16:54:32 2024 Join events: 2 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 0 Merge checks triggered: 0 Coalesce Time: 1350 Version: 156000 Packet queue length: 39 Total packets enqueued: 41 Packet queue high watermark: 39 Adj-out list count: 38000 Advertise list: not empty Flags: Peers: - 210.4.1.51 - 210.2.1.179
End of update:
Update-group 5: Created: Tue Jun 4 16:54:32 2024 MRAI value (seconds): 0
Update-subgroup 9: Created: Tue Jun 4 16:54:32 2024 Join events: 2 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 0 Merge checks triggered: 0 Coalesce Time: 1350 Version: 156000 Packet queue length: 0 Total packets enqueued: 48 Packet queue high watermark: 39 Adj-out list count: 38000 Advertise list: empty Flags: Peers: - 210.4.1.51 - 210.2.1.179
2 bgp default subgroup-pkt-queue-max 30 updaing
Update-group 9: Created: Tue Jun 4 16:59:00 2024 MRAI value (seconds): 0
Update-subgroup 13: Created: Tue Jun 4 16:59:00 2024 Join events: 2 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 0 Merge checks triggered: 1 Coalesce Time: 1350 Version: 270000 Packet queue length: 0 Total packets enqueued: 2 Packet queue high watermark: 2 Adj-out list count: 38000 Advertise list: not empty Flags: Peers: - 210.4.1.51 - 210.2.1.179
End of update: Update-group 9: Created: Tue Jun 4 16:59:00 2024 MRAI value (seconds): 0
Update-subgroup 13: Created: Tue Jun 4 16:59:00 2024 Join events: 2 Prune events: 0 Merge events: 0 Split events: 0 Update group switch events: 0 Peer refreshes combined: 0 Merge checks triggered: 1 Coalesce Time: 1350 Version: 270000 Packet queue length: 0 Total packets enqueued: 48 Packet queue high watermark: 29 Adj-out list count: 38000 Advertise list: empty Flags: Peers: - 210.4.1.51 - 210.2.1.179
This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.
This issue will be automatically closed in the specified period unless there is further activity.