PcapPlusPlus icon indicating copy to clipboard operation
PcapPlusPlus copied to clipboard

Xdp Rx Queue

Open JasMetzger opened this issue 6 months ago • 14 comments

Question

I am exploring XDP for high speed network processing and considering PCAPPlusPlus which has wrapper for the user side. Perhaps I don't understand XDP enough but I think the AF_XDP socket is associated with an rx queue but I don't see any reference to that in the open() function which configures it. How are rx queues designated? Thanks.

Update: I think the uint32_t rxId = 0; in the receivePackets function indicates the rx queue. Any reason why this isn't an argument for the function so that the application can define multiple threads each one to handle a receive queue for performance or is that not going to help. Thanks.

Operating systems

Linux

JasMetzger avatar Oct 09 '25 20:10 JasMetzger

Hi @JasMetzger ,

Not really that familiar with XDP, but from what I understand the AF_XDP concept is based on 4 queue types:

  • FillRing: Used for transfer of memory blocks from User space -> Kernel space
  • CompletionRing: Used for transfer of memory blocks from Kernel space -> User space
  • TxRing: Packets to be transmitted by the kernel are submitted here by the user space.
  • RxRing: Received packets by the kernel are received here by the user space.

The queues can be further divided by owner:

  • FillRing & CompletionRing are owned by the Umem region. They are created in xsk_umem__create.
  • TxRing & RxRing are owned by the respective AF_XDP socket. Each socket can have a Tx queue, Rx queue or both. But from what I understand they only have up to 1 of each type. They are created during the socket setup in xsk_socket_create.

There are 2 independent flows in which the Umem frames are passed around in these queues:

  • Rx Flow: User space -> FillRing -> Kernel -> RxRing -> User space
  • Tx Flow: User space -> TxRing -> Kernel -> CompletionRing -> User space

All of the abovementioned queues are single-producer / single-consumer, which means that a single socket can only have 1 thread per flow, assuming the flows use separate frames and the receiver procedure does not require access to the transmission. However, I am not sure if this will work without data races through the Pcpp API. It probably should, but I can't confirm it without looking more thoroughly.

I think the uint32_t rxId = 0; in the receivePackets function indicates the rx queue.

I am pretty sure that is the index of the slot in the RxRing. It is initialized to 0 and then updated by xsk_ring_cons__peek with the index of the start of the slots available to dequeue from the ring.

In summary, a single AF_XDP socket only has 1 Rx Queue internally managed by the kernel, so there isn't really a way to configure it more than defining the size of the queue. Calling XdpDevice::receivePackets on a single instance from more than one thread at a time is unsafe and will lead to data races, as no synchronisation is done to prevent that.

cc: @seladb (I think this is correct. Could you confirm this?)

Dimi1010 avatar Oct 15 '25 22:10 Dimi1010

@Dimi1010 is correct - there is actually one RX ring between the user space and the kernel space. rxId is the slot inside the ring and not the rx queue ID like in DPDK.

Still, AF_XDP can be used for high speed network processing. I'm not sure about the performance compared to DPDK, but AF_XDP is simpler to use because it's supported by the kernel and doesn't require a special NIC

seladb avatar Oct 16 '25 07:10 seladb

Thank you all for the response. The simplicity / performance tradeoff of XDP vs DPDK seems compelling, especially for container applications.

On the creation of the socket function one specifies a queue id (third argument). My understanding is that a NIC provides 4 or so queues that can be used by the kernel part to distribute packets. The user side app can create a socket (with handling thread) per queue. In the XdpDevice class it seems this is hardwired to zero:

	int ret = xsk_socket__create(&socketInfo->xsk, m_InterfaceName.c_str(), 0, umemInfo->umem, &socketInfo->rx,
	                             &socketInfo->tx, &xskConfig);

I am not sure I need or want to use multiple queues but want to leave my options open.

Thanks again.

JasMetzger avatar Oct 16 '25 09:10 JasMetzger

On the creation of the socket function one specifies a queue id (third argument). My understanding is that a NIC provides 4 or so queues that can be used by the kernel part to distribute packets. The user side app can create a socket (with handling thread) per queue. In the XdpDevice class it seems this is hardwired to zero.

That does appear to be correct, and it does appear to be hardcoded to queue 0. :/ We should probably allow the queue to be specified somehow. Possibly during construction or open() call? Alternatively we can allow to specify how many hardware queues to be opened by the XdpDevice class in open(), and we can internally manage it so we don't need users to create multiple XdpDevice instances for the same NIC which can potentially conflict?


@seladb I also looked a bit further into possible configurations and found this thread on an xdp mailing list.

Apparently XDP supports the following configurations for a hardware queue:

  • 1 Hardware queue - 1 XdpUmem - 1 socket (what we have currently, except its hardcoded to HW queue 0)
  • 1 Hardware queue - 1 XdpUmem - N sockets (created with xsk_socket_create_shared) This apparently allows a single hardware queue to be load-balanced between multiple sockets, if user space processing times are high.

Perhaps we can think about adding a way to do that configuration after adding a way to specify hardware queues?

Dimi1010 avatar Oct 16 '25 20:10 Dimi1010

@Dimi1010 @JasMetzger I think you're right! We can definitely support setting the hardware RX queue. I'd start with adding it to XdpDeviceConfiguration so users can configure it. I'd also check that the NIC indeed supports the queueId provided, we can do it by looking at /sys/class/net/etho0/queues/

seladb@seladb-VirtualBox:~$ ls -l /sys/class/net/etho0/queues/
total 0
drwxr-xr-x 2 root root 0 Oct 16 22:46 rx-0
drwxr-xr-x 3 root root 0 Oct 16 22:46 tx-0

Later we can consider supportting xsk_socket_create_shared which will require more changes.

seladb avatar Oct 17 '25 05:10 seladb

Thank you all again.

It seems right that the queues are associated with a single XdpDevice. It also seems right that a single queue is associated with a single socket.

I am not sure what makes sense for XDP since I am trying to understand it still:

  • Define the number of queues in the XdpConfiguration for the device. Check against the hardware specs.
  • void* m_SocketInfo would become an array? That has lots of implications. OnPacketsArrive would have the queue id the packets arrived on as maybe an additional argument default to 0. Similar for sending packets, which queue to use.

Thanks again.

JasMetzger avatar Oct 17 '25 09:10 JasMetzger

If we want to have one XdpDevice / socket per queue - I think it'd make sense to add it to XdpDeviceConfiguration since it'd be a config param of the device. This is also the simpler option to implement and the one that'd make the API simpler and easier to understand.

@JasMetzger would you consider opening a PR for it? Since you're exploring AF_XDP, it might be a good learning opportunity. Do you have a NIC with multiple RX queues you can test it with?

seladb avatar Oct 18 '25 03:10 seladb

I was afraid you would ask that. OK---I'll give it a go. I suppose I am forking from the master branch? I have NIC with multiple RX queues I can test with. I think most modern NICs have multiple Rx Queues

JasMetzger avatar Oct 18 '25 09:10 JasMetzger

I was afraid you would ask that. OK---I'll give it a go. I suppose I am forking from the master branch? I have NIC with multiple RX queues I can test with. I think most modern NICs have multiple Rx Queues

I recommend creating a new branch from the current dev and working on that. Also base the PRs to dev branch, not master, as the pull request CIs run only if its based on dev.

Dimi1010 avatar Oct 18 '25 10:10 Dimi1010

I did a PR I think. I had to push to dev a few more times after that so I don't know if the PR includes that. FYI

JasMetzger avatar Oct 18 '25 11:10 JasMetzger

It seems the headroom should be part of the configuration too. Currently it defaults to zero. Is this something that can be added to the PR used for rx queue id?

JasMetzger avatar Dec 02 '25 13:12 JasMetzger

It seems the headroom should be part of the configuration too. Currently it defaults to zero. Is this something that can be added to the PR used for rx queue id?

@JasMetzger I didn't know the headroom is configurable as well 🙂 Sure, we can add it to the same PR

seladb avatar Dec 03 '25 06:12 seladb

OK--thanks. It will be along the same lines as the rx queue id.

JasMetzger avatar Dec 03 '25 11:12 JasMetzger

OK--thanks. It will be along the same lines as the rx queue id.

Yes, that'd be great 👍

seladb avatar Dec 04 '25 05:12 seladb