frr icon indicating copy to clipboard operation
frr copied to clipboard

BFDD : BFD session stays until the last static route over nexhop

Open varuntumbe opened this issue 6 months ago • 4 comments

What is the problem ?

When we have more then one static route over the same nexthop with BFD, session goes down and gets cleared if we remove one of the static route.

What is the Root Cause ?

  1. When we create static route over a nexthop with BFD, bfdd_dest_register routine will be called where for the first flow, we create new pcn object ( pcn_new ), we increment the refcount and notify the client via ptm_bfd_notify

But for the subsequent addition of static route over the same nh, means pcn_lookup would give the same existing pcn skipping the increment of refcount.

so essentially, refcount remains at value 1, even if you add any number of static route over the nh.

  1. When we remove one of the static, bfdd_dest_deregister will called where we do pcn_free ( where we decrement the refcount ) and session delete ( _ptm_bfd_session_del )

What is the fix ?

Fix has 2 parts

  1. Incrementing the refcount properly ( early before pcn_lookup) for every static route addition. This makes sures that refcount gets updated properly for every addition of static route

  2. Now when we start removing the static routes one by one, we decrement the refcount ( in bfdd_dest_deregister ) and will call the pcn_free only when refcount becomes 0 ( essentially deleting the session only when there are no static routes points to nh )

Closes https://github.com/FRRouting/frr/issues/19014

varuntumbe avatar Jun 15 '25 10:06 varuntumbe

======> cmd : show bfd peers json

[ { "multihop":false, "peer":"2001:db8:1::1", "local":"2001:db8:1::2", "vrf":"default", "interface":"r2-eth0", "id":2411218638, "remote-id":38497228, "passive-mode":false, "log-session-changes":false, "status":"up", "uptime":4, "diagnostic":"ok", "remote-diagnostic":"ok", "type":"dynamic", "receive-interval":600, "transmit-interval":600, "echo-receive-interval":50, "echo-transmit-interval":0, "detect-multiplier":3, "remote-receive-interval":600, "remote-transmit-interval":600, "remote-echo-receive-interval":50, "remote-detect-multiplier":3, "rtt-min":0, "rtt-avg":0, "rtt-max":0 }, { "multihop":false, "peer":"2001:db8:2::1", "vrf":"default", "id":3088557958, "remote-id":2847372379, "passive-mode":false, "log-session-changes":false, "status":"up", "uptime":3, "diagnostic":"ok", "remote-diagnostic":"ok", "type":"dynamic", "receive-interval":2000, "transmit-interval":2000, "echo-receive-interval":50, "echo-transmit-interval":0, "detect-multiplier":3, "remote-receive-interval":2000, "remote-transmit-interval":2000, "remote-echo-receive-interval":50, "remote-detect-multiplier":3, "rtt-min":0, "rtt-avg":0, "rtt-max":0 } ]

================> from the router

[ { "detect-multiplier": 3, "diagnostic": "ok", "echo-receive-interval": 50, "echo-transmit-interval": 0, "id": "", "interface": "r2-eth0", "local": "2001:db8:1::2", "multihop": false, "passive-mode": false, "peer": "2001:db8:1::1", "receive-interval": 600, "remote-detect-multiplier": 3, "remote-diagnostic": "ok", "remote-echo-receive-interval": 50, "remote-id": "", "remote-receive-interval": 600, "remote-transmit-interval": 600, "status": "up", "uptime": "", "transmit-interval": 600 }, { "detect-multiplier": 3, "diagnostic": "ok", "echo-receive-interval": 50, "echo-transmit-interval": 0, "id": "", "interface": "r2-eth1", "local": "2001:db8:2::2", "multihop": false, "passive-mode": false, "peer": "2001:db8:2::1", "receive-interval": 2000, "remote-detect-multiplier": 3, "remote-diagnostic": "ok", "remote-echo-receive-interval": 50, "remote-id": "", "remote-receive-interval": 2000, "remote-transmit-interval": 2000, "status": "up", "uptime": "", "transmit-interval": 2000 } ]

===============> expected o/p from the testcase file

one of the topotest failing consistantly ( bfd_topo3 ) where expected does not match the received o/p from the router.

When I checked, its failing because its not able to find "local" and/or "interface" fields ( which are in expected files but not in router gen o/p )

I dont see how my changes affected this failure. Any idea @ton31337 ?

Thanks, Varun

varuntumbe avatar Jun 17 '25 17:06 varuntumbe

Hello,

Could you let me know when this fix can be merged?

Thanks.

thierrydevriendt avatar Nov 17 '25 14:11 thierrydevriendt

@varuntumbe are you still working on this?

ton31337 avatar Nov 21 '25 13:11 ton31337

@varuntumbe are you still working on this?

Hi @ton31337 ,

This solution is fundamentally wrong ( as I found out when one of the bfd topo test failed ). This needs a rework. I tried reworking this couple of months back but was not able to code it up to a proper solution. I am not working on this now. Apologies.

thanks

varuntumbe avatar Nov 23 '25 16:11 varuntumbe