frr
frr copied to clipboard
Issue with BGP peering not coming up occasionally
- [X ] Did you check if this is a duplicate issue?
- [ ] Did you test it on the latest FRRouting/frr master branch?
To Reproduce The issue is very random and is not easily reproducible.
Expected behavior
We have v4 and v6 BGP peering between two machines. One is a linux based VM running FRR and another is an actual router. We have per VRF BGP router instances created on both of them with two v4 and two v6 peer each. In most of the cases peering works fine and we are able to see all the peers in established state. However, there are rare cases when, after restarting FRR, bgp peering doesn't come up and peers are in "Active/Connected" state. We have tried increasing logging to see what all neighbor events are going on but don't see anything there as well. We have verified that all interfaces are up, peers reachable and configuration is correct.
As part of recovery when we do a restart again, the bgp peering comes up successfully.
Below is the snapshot of the configuration:-
zebra.conf
interface BondHostIf.1784 vrf vrf1
ip address 192.168.40.16/24
ip address fdfd:fca4:3ba0:1784:192:168:40:16/64
!
bgpd.conf
router bgp 65000 vrf vrf1
bgp router-id 192.168.40.16
neighbor 192.168.40.45 remote-as 65000
neighbor 192.168.40.46 remote-as 65000
neighbor fdfd:fca4:3ba0:1784:192:168:40:45 remote-as 65000
neighbor fdfd:fca4:3ba0:1784:192:168:40:46 remote-as 65000
no bgp network import-check
!
address-family ipv4 unicast
redistribute connected
network 100.64.0.0/12
neighbor 192.168.40.45 activate
neighbor 192.168.40.45 prefix-list vrf1_DENY_IN_V4 in
neighbor 192.168.40.45 prefix-list vrf1_ALLOW_OUT_V4 out
neighbor 192.168.40.46 activate
neighbor 192.168.40.46 prefix-list vrf1_DENY_IN_V4 in
neighbor 192.168.40.46 prefix-list vrf1_ALLOW_OUT_V4 out
exit-address-family
!
address-family ipv6 unicast
redistribute connected
network 2001:5b0:9800::/44
neighbor fdfd:fca4:3ba0:1784:192:168:40:45 activate
neighbor fdfd:fca4:3ba0:1784:192:168:40:45 prefix-list vrf1_DENY_IN_V6 in
neighbor fdfd:fca4:3ba0:1784:192:168:40:45 prefix-list vrf1_ALLOW_OUT_V6 out
neighbor fdfd:fca4:3ba0:1784:192:168:40:46 activate
neighbor fdfd:fca4:3ba0:1784:192:168:40:46 prefix-list vrf1_DENY_IN_V6 in
neighbor fdfd:fca4:3ba0:1784:192:168:40:46 prefix-list vrf1_ALLOW_OUT_V6 out
exit-address-family
!
ip prefix-list vrf1_DENY_IN_V4 seq 5 deny any
ip prefix-list vrf1_ALLOW_OUT_V4 seq 5 permit 100.64.0.0/12
!
ipv6 prefix-list vrf1_DENY_IN_V6 seq 5 deny any
ipv6 prefix-list vrf1_ALLOW_OUT_V6 seq 5 permit 2001:5b0:9800::/44
!
Screenshots
Versions
- OS Version: AlmaLinux release 8.6 (Sky Tiger)
- Kernel: 4.18.0-372.32.1.el8_6.x86_64
- FRR Version: FRRouting 8.1
Additional context
Try with the latest versions, 8.1 is too old to be supported technically. And let us know if the issue persists.
In the failure state what does show bgp ipv4 uni summ fail
return?
Below is the output for the command:-
j3sixvmstm01# show bgp vrf vrf1 ipv4 unicast summary failed BGP router identifier 192.168.42.15, local AS number 65000 vrf-id 69 BGP table version 2 RIB entries 3, using 552 bytes of memory Peers 4, using 2892 KiB of memory
Neighbor EstdCnt DropCnt ResetTime Reason 192.168.42.43 1 1 00:51:00 Notification sent (Hold Timer Expired) 192.168.42.44 2 2 00:50:58 Notification sent (Hold Timer Expired) fdfd:fca4:3bc0:1786:192:168:42:43 2 2 00:51:06 Notification sent (Hold Timer Expired) fdfd:fca4:3bc0:1786:192:168:42:44 3 3 00:51:04 Notification sent (Hold Timer Expired)
Displayed neighbors 4 Total number of neighbors 4
Thanks and Regards, Chinmaya Agarwal.
Before figuring out what's the real issue here, can you try the newer versions at least 8.5?