frr icon indicating copy to clipboard operation
frr copied to clipboard

ARP snooping support for FRR

Open odmitriyev opened this issue 7 months ago • 15 comments

Is ARP snooping supported to populate MAC-IP binding and generate MAC/IP routes for FRR? This feature is needed for the correct operation of proxy ARP and ARP suppression (kernel function). This is also needed when using anycast gateway (with the same MAC/IP addresses) on the border node to synchronize ARP between them.

For example, here is a description of this feature from Juniper:

Typically, to generate an EVPN Type-2 MAC+IP route, a VTEP must have a MAC-IP binding in its ARP cache. However, 
depending on the platform and whether ARP snooping and ARP suppression are supported (and if they are enabled), a VTEP 
can snoop the IP address as well, and generate MAC+IP routes, even if the VTEP has no corresponding IRB interface for the 
VLAN (and thus no ARP entry to resolve the MAC-to-IP binding).

As discussed in the previous section, a VTEP can snoop inbound ARP packets (as an example) and generate EVPN Type-2 
MAC+IP routes, despite no IRB interface configured for the corresponding VLAN (and thus no ARP resolution to create an entry
in the ARP cache for the MAC-IP binding). This paves the way for features like proxy ARP and ARP suppression to come into play.

odmitriyev avatar May 21 '25 17:05 odmitriyev

I believe zebra is doing what you describe - have you looked at what zebra_neigh.c and zebra_evpn_neigh.c are doing? is there something that you think is not working?

mjstapp avatar May 21 '25 17:05 mjstapp

I didn't find anything that would do arp snooping for transit packets. I think zebra already works with a filled arp table (ip neig), but on my test bench the macip route for host-3 is not generated, because it is not in the arp table. It starts to be generated only when you explicitly ping the SVI interface and therefore the ARP table is filled.

Image

odmitriyev avatar May 21 '25 18:05 odmitriyev

Cumulus Linux accomplishes this by way of a separate python daemon (neighmgrd) which installs BPF rules on non-VXLAN bridge_slave to get copies of ARP/ND packets from the kernel for snooping. Unless something has significantly changed in the last couple years, that functionality is not part of zebra/FRR -- although I believe it would be quite useful and welcome for EVPN use cases.

taspelund avatar May 21 '25 18:05 taspelund

Cumulus Linux accomplishes this by way of a separate python daemon (neighmgrd) which installs BPF rules on non-VXLAN bridge_slave to get copies of ARP/ND packets from the kernel for snooping. Unless something has significantly changed in the last couple years, that functionality is not part of zebra/FRR -- although I believe it would be quite useful and welcome for EVPN use cases.

Then I don't fully understand how without snooping support proxy ARP and ARP suppression can be fully used when there are no MAC/IP routes to fill ARP tables.

The case with using anycast gateway can also cause problems without snooping when routers do not have the ability to synchronize arps learned by data plane

odmitriyev avatar May 21 '25 18:05 odmitriyev

I think you're lumping ARP suppression together with fdb/neighbor cache population, when those are distinct things.

ARP suppression is a dataplane function provided by the kernel. If you enable the neigh_suppress attribute on the VXLAN bridge_slave device, then the kernel will use the neigh cache to determine whether or not to suppress flooding of the ARP/ND request + send a proxy reply. The kernel reads from the neighbor cache for that functionality, but is ultimately separate from how entries are added to the cache.

The kernel has its own ARP/ND stack which populates the neigh caches as a result of routing (needing to resolve a next-hop or a directly connected IP), receiving ARP/ND requests from connected devices, receipt of GARP/Unsolicited-NA packets, or from a userspace request (via a request sent via an open netlink socket). However, the kernel doesn't natively populate its neigh caches based on transit ARP/ND (ARP/ND that isn't sent to/from the kernel), which is why Cumulus has a userspace daemon do the snooping.

So even though FRR doesn't provide its own snooper for transit ARP/ND packets, the kernel will populate its neigh caches for ARP/ND that terminates on the local system without any need for FRR or another userspace entity intervening. So it's not like the neigh caches are non-functional without something like neighmgrd, but snooping does enhance the neigh cache population pretty significantly. Regardless of whether there's an entity doing ARP/ND snooping, the kernel will still do ARP/ND suppression based on the entries in the neigh cache.

IMO ARP/ND snooping is something that should be handled in the dataplane, regardless of whether that logic lives in userspace or the kernel. I'd love to see neighmgrd be open sourced by ~Cumulus~ NVIDIA, but I doubt that is going to happen this late in the game (that daemon is probably close to 10 years old at this point). What seems more likely is for someone to either contribute that functionality into FRR directly or for a separate daemon to be written + open sourced.

taspelund avatar May 21 '25 18:05 taspelund

One other thing to call out is that the kernel only does ARP/ND suppression for "remote" neighbors (there is a flag set on entries installed by zebra that were learned from a remote EVPN VTEP), and snooping is a local function

taspelund avatar May 21 '25 19:05 taspelund

I think you're lumping ARP suppression together with fdb/neighbor cache population, when those are distinct things.

Yes, thanks for the clarification, I understand what you are saying but I wanted to say that proxy arp/arp supression don't work fully when there is no proxying on the nodes because the mac-ip table will not be complete in this case and you will still need to use flooding to study the arp. For example, in the case that I indicated above, even with the presence of an svi interface in the bridge, the mac-ip route for host-3 (192.168.100.100) is not generated and due just flooding, this also causes issue with the operation of anycast gw.

odmitriyev avatar May 22 '25 04:05 odmitriyev

@taspelund please tell me, do you know a working case in which frr can generate RT2 macip routes that other nodes can use to building their mac-ip tables? I can't think of anything other than the idea of ​​a local svi interface that would act as a gateway and thus study arps from hosts

odmitriyev avatar May 26 '25 06:05 odmitriyev

Pretty much all the use cases you'd expect. MAC learning on bridges works just fine without snooping of transit ARP/ND so MAC-Only Type-2s would be generated from that. If the FRR VTEP has an SVI and hosts are configured to send GARP/Unsolicited-NA, then MAC+IP Type-2s would be generated when neigh entries are learned. If the FRR VTEP has an SVI and is the gateway, then the neigh cache would be populated anytime a host needs to resolve its default gateway, and MAC+IP Type-2s would be generated.

All the use cases remain the same, and FRR's ability to be an EVPN control plane (learning about local data plane entries + advertising them, learning about remote entries via BGP + installing them, doing MAC Mobility procedures when things move, etc.) is unaffected.

taspelund avatar May 27 '25 06:05 taspelund

I might have a case of doing it wrong in my lab, but I think arp snooping is what I'm missing for my experiments.

I have a KVM host running frr 10.3 peered with two sonic switches (frr 8.2.2 under the covers) that try to provide an anycast gateway (same mac and ip) for my hosts and VMs.

The KVM host correctly advertises the type 2 mac+ip for my host management IP and that's working perfectly.

The KVM host correctly advertises the type 2 route with strictly the mac address of the VMs, because the linux bridge for the VMs doesn't have an IP in the VM network.

Only one of the switches successfully resolves ARP at a given time since they have the same MAC. Because the switch already has the VM's mac installed in the neighbor table as remote, it doesn't try to advertise a more complete mac+ip route after arp resolves but overall connectivity works on that one switch. Traffic that lands at the switch that couldn't resolve arp ends up dying.

If I set arp_accept=1 at the kvm host and send a gratuitous ARP from a VM the host ends up learning the VM's IP and begins including it in the type 2 routes, but this feels brittle.

My guess is with arp snooping at the host I'll be able to learn the IP address and advertise it out so the mac-vrf at the switches doesn't try to rely on arp too much.

also a fun read - https://fedepaol.github.io/blog/2024/06/01/fixing-arp-suppression-in-linux-l2evpn-with-ebpf/

dstoy53 avatar May 27 '25 17:05 dstoy53

Thanks for the feedback @taspelund @dstoy53

I opened this discussion with exactly the same issue (as @dstoy53 described above) with ARP synchronization between Anycast GWs.

In our case, Anycast gateways are built on Juniper MX and to solve this issue there is a proxy-macip-advertisement option in JunOS, but it has some problems when working with silent hosts (which I am also trying to investigate with the vendor now), so I wondered about using arp proxying on FRR.

As @taspelund wrote earlier, the option with GARP listening (and having SVI with an IP address in the bridge) generally works, but I don't know how stable it will be in production. Also, here you definitely need to have SVI with some IP address in bridge (it can be fake IP) and for pure L2 this does not work. For pure L2 I think ARP proxying would help a lot here.

I'll also read the options described here and check) https://fedepaol.github.io/blog/2024/06/01/fixing-arp-suppression-in-linux-l2evpn-with-ebpf/

Thank you!

odmitriyev avatar May 28 '25 07:05 odmitriyev

Also, here you definitely need to have SVI with some IP address in bridge (it can be fake IP) and for pure L2 this does not work. For pure L2 I think ARP proxying would help a lot here.

here it seems I haven't fully checked.... it's enough to enable arp reception (arp_accept = 1) on the svi interface in the bridge it's not necessary on the svi interface IP address

odmitriyev avatar May 28 '25 10:05 odmitriyev

Am i helping yet?

https://github.com/1984hosting/neighsnoopd

davischw avatar May 30 '25 14:05 davischw

Am i helping yet?

https://github.com/1984hosting/neighsnoopd

great! thanks

looks like what i was looking for will test on my case

odmitriyev avatar Jun 01 '25 16:06 odmitriyev

May I ask if this is related to ARP request broadcast packets not being forwarded to other VTEPs?

Host1 -LinuxBridge-VxLAN- VTEP1 (Debian) -L3- VTEP2 (Debian) -LinuxBridge-VxLAN - GW2

both GW2 & Host1 starts without any gracious or or ARP related packets being send before. Host1 sends Broadcast ARP request for GW2 broadcast observed by VTEP1, but not broadcasted by VTEP2, thus GW2 never responds at all.

or are there other settings I've missed that just make it looks like this case/requirement?

hvisage avatar Jun 06 '25 12:06 hvisage

May I ask if this is related to ARP request broadcast packets not being forwarded to other VTEPs?

Host1 -LinuxBridge-VxLAN- VTEP1 (Debian) -L3- VTEP2 (Debian) -LinuxBridge-VxLAN - GW2

both GW2 & Host1 starts without any gracious or or ARP related packets being send before. Host1 sends Broadcast ARP request for GW2 broadcast observed by VTEP1, but not broadcasted by VTEP2, thus GW2 never responds at all.

or are there other settings I've missed that just make it looks like this case/requirement?

I think this case is a little different from yours I would start by checking if flooding is enabled on vxlan/br interfaces and in frr etc. (ip -d link show vxlanxxx / etc)

odmitriyev avatar Jun 27 '25 17:06 odmitriyev