bgpd: add support of RFC 4684 - Routing Target (RT) Constrain for L3VPN and EVPN
Add support of RFC 4684 - Routing Target (RT) Constrain for L3VPN and EVPN
Route Target (RT) Constraint, as defined in RFC 4684, is a mechanism that allows routers to receive only the L3VPN and EVPN prefixes they are interested in. This optimization reduces unnecessary BGP updates and conserves routing resources.
The mechanism relies on Route Target subscriptions using RTC prefixes with AFI IPv4 and SAFI: RTC (value 132)
Continuation of @Sohn123 work in https://github.com/FRRouting/frr/pull/13476
umm - is it April 1 already? just kidding - but really, you can't expect anyone to review 100K lines and 4000 files?
umm - is it April 1 already? just kidding - but really, you can't expect anyone to review 100K lines and 4000 files?
one test may really be a pull request by itself. rest of the code is very reasonable.
umm - is it April 1 already? just kidding - but really, you can't expect anyone to review 100K lines and 4000 files?
There is a lot of JSON files for tests. JSON output is quite verbose. I can remove some tests but I am not sure it is good idea.
The code is reviewable commit by commit.
Some of the questions are addressed in the commit log descriptions, but I am providing a summarized version of the information here.
I spoke with Alexander on Slack before development to understand what he had already completed and what was still pending. Additionally, I consulted with him regarding the design, such as for the update-group issue.
This feature needs some design documentation and discussion.
I’m open to discussion. I’m providing answers here, but we can arrange a dedicated meeting with Alexander if needed. A documentation design will follow afterward.
I've also looked at Alexander's earlier work, but I came to very different conclusions about it, and I think we need to be able to understand your choices and the impact of those choices. It would be good to have information about:
- what data structures did you choose, and why? what are the impacts of those choices - I think of the way you've used 'struct list', and the impact you've had on lib/prefix
There is no added struct list in lib/prefix. Alexander introduced a new RTC type of prefix, and I added a hook to display the RTC prefixes.
In my code version, struct list is used for the RTC prefix-list. Alexander's version relied on a modified version of lib/plist. However, using the existing library posed an issue: an RTC prefix consists of 4 bytes for the Origin AS and 8 bytes for the Route-Target, but only the Route-Target is used for matching. This meant that having RT:65000:100 from Origin AS 65001 and 65002 would create two separate entries in lib/plist, whereas only one was actually needed.
- why have you chosen to change the low-level cli code in this way? is there an alternative that would be simpler?
Are you talking about https://github.com/FRRouting/frr/pull/18286/commits/9970450bf394e24dec9335c144a33a076caef2c9 ?
I needed this for modifying the current CLI:
- https://github.com/FRRouting/frr/pull/18286/commits/70a66478245540fe171ef9c84f615db643938c10
- https://github.com/FRRouting/frr/pull/18286/commits/91306e009fb834efbfcb85f65a852b9f3a4d1b78#diff-916a077bdc3f40b019663d23796308dd9ba7648f0a082bb975428b305c54b699R15543
Without https://github.com/FRRouting/frr/pull/18286/commits/9970450bf394e24dec9335c144a33a076caef2c9, the CLI would not have been very clean.
- what does filtering processing look like - how expensive, what scale considerations, for example.
I’m working on a second pull-request to optimize scaling, as the current one is already quite large.
For now, the key question is: Does the RTC feature degrade the performance of a non-RTC setup?
My answer is that the only additional processing in a non-RTC setup is the following check:
https://github.com/FRRouting/frr/pull/18286/commits/31687d7a70499aeeeba8884eb7e28719c894c043#diff-6efcfff51b727fc68bf18cc4eb764ab9886587e607da789f71dedf67408aed24R576
This check verifies whether any peer in the update-group has RTC enabled.
It should not cause a significant performance degradation.
- how are you handling FRR's unique (I think) "wildcard" RT behavior?
Do you have an exemple of a "wildcard" RT utilization ?
- what happens when peers negotiate SAFI_RTC, what are the steps?
Once the session is established, routers begin exchanging UPDATE messages, and each router processes received updates immediately. RTC UPDATEs are prioritized over all other announcements since they are dequeued first.
If a router already has some (E)VPN prefixes in its table when it negotiates RTC with a peer, it begins processing announcements for those prefixes. However, since the RTC prefix-list for that peer is initially empty, WITHDRAW messages will be sent for all of them. The initial step will be optimized in another pull-request.
The RFC recommends implementing a delay before announcing prefixes to an RTC peer, but this will be addressed in a future pull request.
If a BGP speaker chooses to delay the advertisement of BGP VPN route
updates until it receives this End-of-RIB marker, it MUST limit that
delay to an upper bound. By default, a 60 second value should be
used.
what happens if only one peer offers it?
The behavior is documented in the last documentation commit. https://github.com/FRRouting/frr/pull/18286/commits/ace4f22507818e008ef09ade5e504b8c5757a3d5#diff-ed32ce05409388b6bd60ab77267c92e083a8d9f36458a41b830c3d7332147cc8R5824
what happens during the initial exchange of updates?
- what changes are there to update groups and why?
There are no changes to the update-groups.
In Alexander’s implementation, update-groups were disabled for RTC peers—each RTC peer had its own update-group and therefore its own Adj-RIB-out. The RTC filtering was applied before building the Adj-RIB-out.
Since disabling update-groups for RTC was not acceptable, I have modified the approach:
- Announcements are now filtered after building the Adj-RIB-out, inside bpacket_reformat_for_peer().
- RTC peers can now share an update-group but will not receive the same announcements, as filtering is applied individually per peer.
- show bgp neighbor PEER advertised does not display filtered prefixes.
- it looks like you've changed the bgp workqueue/metaqueue- why, what problem were you trying to solve, and what changed?
I have added two BGP workqueues:
-
RTC EoR of RIB marker queue
- Notifies the dequeuer that a set of RTC updates has been received.
- Triggers a refresh of L3VPN / EVPN announcements.
- Activated after an actual RTC End-of-RIB and after receiving RTC updates.
-
RTC prefix queue
- Ensures that RTC prefixes are processed before the RTC EoR of RIB marker.
- Triggers the best path selection algorithm.
- Based on the RTC best path, bgpd can (un)set the RTC prefix lists.
Priority handling: RTC updates are dequeued before other queues to ensure that L3VPN / EVPN announcements use the latest RTC prefix-list version.
- what happens when a peer adds or removes an RTC prefix? what's the order of events; what's the scale/expectation?
When a peer adds or removes an RTC prefix, the local router re-parses the Adj-RIB-out for VPNv4, VPNv6, and EVPN. This ensures that UPDATE or WITHDRAW messages are sent to that peer for all relevant prefixes in the Adj-RIB-out.
The RFC recommends sending only the necessary UPDATES and WITHDRAW messages to update the peer, rather than reprocessing all prefixes in the Adj-RIB-out.
A BGP speaker should generate the minimum set of BGP VPN route updates (advertisements and/or withdrawls) necessary to transition between the previous and current state of the route distribution graph that is derived from Route Target membership information.
It’s still a draft, but the idea is to store the previous RTC prefix-list so I can track whether a prefix has already been advertised to the peer.
Since I am no longer using lib/plist for the RTC prefix-list, I can now store the previous state using flags to manage updates more efficiently.
This optimization will make it scalable, but I plan to publish it in a second pull request to keep the changes manageable.
- how are you handling route-reflector and -server differences?
Route Reflectors & Route Servers Behavior:
- They do not announce their own RTC prefixes, unless the user has explicitly defined static RTC prefixes.
- They re-announce RTC prefixes to inform downstream routers whether they should advertise their (E)VPN prefixes based on the received RTC prefixes.
- They re-announce (E)VPN prefixes they receive, but only if they match their RTC prefix-list.
This behavior is tested in bgp_rtc_l3vpn_topo1 topotest.
Route reflectors have a special best-path selection rule required by the RFC. This rule is implemented in this commit and tested in the bgp_rtc_l3vpn_topo2 topotest.
@louis-6wind explained my reasoning that I told him in our call quite well :) I can add the following things:
- The changes to use an adaptation of the existing prefix list that is more suitable to store RTC prefixes make sense to me.
- Changes of the CLI and how they should be done I can't judge
- When I wrote the code, I made sure, that peers not using the new feature are not affected by the changes, if possible. As far as I know the check @louis-6wind mentioned really is the only code path where peers not using that feature are affected.
- With my code I did some limited performance testing as part of my bachelor's thesis. I still might have the results of the measurements I took and can share them if they are of interest.
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Any updates on this? We also need this feature