ixy icon indicating copy to clipboard operation
ixy copied to clipboard

SEGFAULT when using more than 1 queue

Open cyanide-burnout opened this issue 4 years ago • 6 comments

We have issue when we are using than 1 queue on ixgbe

Stack trace: 1 pkt_buf_free 2 ixgbe_tx_batch

Both queues are processed in single thread, we tried to use single mempool as well as mempool-per-queue. The result is the same - any other queue except #0 caught.

cyanide-burnout avatar Sep 15 '20 13:09 cyanide-burnout

Same problem for me. Fixed it in ixgbe.c, these instructions must be applied for both the RX and TX structures:

  • declaring MAX_RX_QUEUE_ENTRIES through a #define rather than a const int
  • in the ixgbe_rx_queue, the *virtual addresses[] field is now void* virtual_addresses[MAX_RX_QUEUE_ENTRIES]
  • at this calloc(), take away the + sizeof(void*) * ... part

There are also other ways to solve this. However, this is probably the most immediate way. BTW, shout-out to the authors for the nice work!

marcofaltelli avatar Apr 09 '21 13:04 marcofaltelli

Yes, this patch works very well. And by using it we reached performance of DPDK‘s ixgbe twice with less CPU resources.

cyanide-burnout avatar Feb 18 '22 22:02 cyanide-burnout

And by using it we reached performance of DPDK‘s ixgbe twice with less CPU resources.

That seems unlikely, can you elaborate on the exact comparison? The batch from the comment above basically turns a flexible array member into a fixed-size array which, since we don't have any bounds checks anyways, should not make a difference for performance...

The only thing that I can think of is false sharing, but

  • the size of th calloc doesn't change (with the patch it gets it from the now bigger struct vs. a calculation, should be the exact same)
  • this is not DMA-able memory, so it would only be relevant to multi-threaded code which this bug doesn't mention so far

emmericp avatar Feb 18 '22 23:02 emmericp

After applying the fix i rewrote my code. Now transmits in multiple threads. 2-4 queues per thread. It’s production system, which can use different methods to accelerate transmission/reception of UDP. Since I wrote several backends to provide transmission by using optimised sockets, PACKET_MMAP, XDP, DPDK and Ixy, we did some bench tests. @stefansaraev did testing, so i would like to ask him to publish results. But anyway the performance of ixy is quite surprised, probably due to buggly implementation of ixgbe by Intel and unnecessarily complicated code. At least I have found some bugs in the code of their implementation of XDP (https://github.com/xdp-project/xdp-tutorial/issues/273).

cyanide-burnout avatar Feb 18 '22 23:02 cyanide-burnout

Your bug in two words: wrong interpretation of array access. Just imagine, which address has, for example, rx_queue[1] since the size of struct ixgbe_rx_queue doesn't include actual length of virtual_addresses[] and virtual_addresses is not a pointer to array somewhere else. Typical overlapping, where rx_queue[1] overlaps rx_queue[0].virtual_addresses.

cyanide-burnout avatar Feb 19 '22 00:02 cyanide-burnout

I've fixed my comment. Correct explanation is there. 01:30 AM here, I have to be in bed ;)

cyanide-burnout avatar Feb 19 '22 00:02 cyanide-burnout