xdp-tools
xdp-tools copied to clipboard
AF_XDP: Provide synchronisation between processes using XSKs on the same interface
With the addition of the AF_XDP code we will be attaching XDP programs on behalf of the socket owner to redirect the packets. We should make sure this is synchronised so multiple processes can attach sockets to the same interface without stepping on each others' toes.
Since we already have a synchronisation point when building the dispatcher, this could likely be reused.
See also this upstream discussion about doing this in libbpf using bpf_link (which is an approach that will not work with multi-prog dispatchers today, sadly): https://lore.kernel.org/bpf/[email protected]/T/#t
Just a note that we probably need to resolve this before libbpf 1.0. Seems downstream users have started relying on the bpf_link approach in libbpf, so things like this DPDK commit will break when moving to libxdp-based XSK: http://patches.dpdk.org/project/dpdk/patch/[email protected]/
@magnus-karlsson any plans to work on this?
I guess you are referring to the missing "bpf_link like" support in libxdp? If so, we have it in our backlog but should probably bump it up towards the top. I will dig through my old mails on why the libbpf approach could not be ported over as is to libxdp. Have forgotten why.
Ciara, the owner of the AF_XDP PMD in DPDK, also has a task for DPDK 22.02 or 22.05 to move it over from libbpf to libxdp. Before that, this needs to be in place for sure.
Great. Briefly, we can't use the libbpf approach because we're pinning all the bpf_links to build the multi-prog dispatcher, so even if we did pass the link fd to the application, closing that fd wouldn't actually detach anything...
The way it works in libbpf from the user's point of view is the following: First program does xsk_socket__create(xsk1....); Second one does xsk_socket__create(xsk2....); After some time first program does xsk_socket__delete(xsk1); and the default XDP program is still attached and xsk2 works.
Under the hood, xsk_socket__create(xsk1) creates a bpf_link and the creation of xsk2 takes a reference to the same, thus it needs to be closed twice before the XDP program is detached. Maciej can correct me if wrong.
In libxdp, the creation of xsk1 triggers an xdp_program__attach() and the creation of xsk2 triggers an xdp_program__clone(). If I understand the code correctly, and I might not, there is no reference taken in xsd_program__clone() to the bpf_link created by xdp_program__attach(). Would taking a reference to the bpf_link when cloning be a solution? From the xsk point of view, this would be elegant. Closing the clone or the original just decreases the refcount of the bpf_link. But introducing this in xdp_program__clone might have other side effects as it is used internally in libxdp.c.
But we pin the link, so closing the last entry is not going to remove the XDP program. And we have to pin it to make it available for someone else who might want to attach a second BPF program before or after the XSK default prog.
However, we already synchronise the creation and removal of programs using flock() on the pin directory (see xdp_lock_{acquire,release}()). So we can just make the XSK code do the socket map manipulation while holding that lock, and then simply remove the default program if it's removing the last socket from the map. At least I think this would be sufficient?
Does the xdp_lock_* require bpffs support compiled in the kernel?
Yup, the whole libxdp dispatcher logic depends on bpffs
OK. So then we need two ways to deal with this. One when the dispatcher can and is loaded and one when it is not. It would be nice to come up with a single implementation, but maybe not possible.
Right, I don't think we can use a single implementation, unfortunately. We already have a compatibility check in xdp_multiprog__check_compat(), where the loader will fall back to attaching a single program if the compatibility check fails. We could add an internal flag to make the fallback mode use bpf_link attachment instead of netlink, which would give us behaviour identical to what's currently in libbpf (I think)...
Fixed with the merging of #203 (including support when bpffs is not mounted).