libxlio icon indicating copy to clipboard operation
libxlio copied to clipboard

No traffic sent on Bluefield-2

Open mpodles opened this issue 10 months ago • 2 comments

Subject

Dear Team,

I'm trying to benchmark NVMe-oF over TCP/IP with and without XLIO. I'm able to get iperf and spdk_perf working between machines but when XLIO is used, no traffic is coming out of the initiator. It's not visible in either tcpdump, ethtool stats, switch that's in-between the machines or the target machine.

An example packet, that the spdk_perf tries to send is ARP:

#0 qp_mgr_eth_mlx5::fill_wqe (this=0xaaaaaada6010, pswr=) at dev/qp_mgr_eth_mlx5.cpp:485 #1 0x0000fffff7024db0 in qp_mgr_eth_mlx5::send_to_wire (this=0xaaaaaada6010, p_send_wqe=, attr=, request_comp=, tis=, credits=) at dev/qp_mgr_eth_mlx5.cpp:748 #2 0x0000fffff7020ac8 in qp_mgr::send (this=0xaaaaaada6010, p_send_wqe=p_send_wqe@entry=0xaaaaaada4510, attr=attr@entry=0, tis=tis@entry=0x0, credits=credits@entry=2) at dev/qp_mgr.cpp:611 #3 0x0000fffff704d9b0 in ring_simple::send_buffer (tis=0x0, attr=, p_send_wqe=0xaaaaaada4510, this=0xaaaaaada4920) at dev/ring_simple.cpp:746 #4 ring_simple::send_ring_buffer (this=0xaaaaaada4920, id=, p_send_wqe=0xaaaaaada4510, attr=) at dev/ring_simple.cpp:776 #5 0x0000fffff7077794 in neigh_eth::send_arp_request (this=0xaaaaaada4370, is_broadcast=) at proto/neighbour.cpp:1661 #6 0x0000fffff70724a4 in neigh_entry::send_discovery_request (this=0xaaaaaada4370) at proto/neighbour.cpp:393

After this, it successfully gets completion in:

#0 cq_mgr_mlx5::poll_and_process_element_tx (this=0xaaaaaada63d0, p_cq_poll_sn=0xffffffffd680) at dev/cq_mgr_mlx5.cpp:542 #1 0x0000fffff7020a68 in qp_mgr::send (this=0xaaaaaada60d0, p_send_wqe=p_send_wqe@entry=0xaaaaaada4700, attr=attr@entry=0, tis=tis@entry=0x0, credits=credits@entry=2) at dev/qp_mgr.cpp:605 #2 0x0000fffff704d9b0 in ring_simple::send_buffer (tis=0x0, attr=, p_send_wqe=0xaaaaaada4700, this=0xaaaaaada4b10) at dev/ring_simple.cpp:746 #3 ring_simple::send_ring_buffer (this=0xaaaaaada4b10, id=, p_send_wqe=0xaaaaaada4700, attr=) at dev/ring_simple.cpp:776 #4 0x0000fffff7077794 in neigh_eth::send_arp_request (this=0xaaaaaada4560, is_broadcast=) at proto/neighbour.cpp:1661 #5 0x0000fffff70724a4 in neigh_entry::send_discovery_request (this=0xaaaaaada4560) at proto/neighbour.cpp:393

I've checked the device It's using for the ARP and it looks correct - p1 (it's the name of physical function interface on Bluefield). Thanks in advance for any help.

Cheers

Issue type

  • [x] Bug report
  • [ ] Feature request

Configuration:

  • Product version XLIO_VERSION: 3.21.2-0 Development Snapshot built on Mar 22 2024 12:01:27 -- DEBUG --
    Git: d4767597a6478f5fddb6bc21dbb39455447cb627

  • OS Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal

  • OFED
    MLNX_OFED_LINUX-5.8-3.0.5.0 (OFED-5.8-3.0.5)

  • Hardware Bluefield-2 MBF2M516A-CEEO_Ax_Bx (2x100Gbs)

Actual behavior:

No traffic coming out of the network interface even though WQ is posted and CQ is received.

Expected behavior:

SPDK perf or iperf are able to connect and send traffic

Steps to reproduce:

sudo LD_PRELOAD=/opt/mellanox/libxlio/lib/libxlio.so iperf -t 30 -c 20.20.20.4 -m -P 1 -i 1 -M 1500 or sudo SPDK_XLIO_PATH=/opt/mellanox/libxlio/lib/libxlio.so XLIO_TRACELEVEL=DEBUG ~/spdk-23.01/build/examples/perf -q 64 -o $((2**12)) -w randread -r 'trtype:nvda_tcp adrfam:IPv4 traddr:20.20.20.4 trsvcid:4420' -t 300 -c 0x01 --transport-stats -G --default-sock-impl xlio

mpodles avatar Apr 11 '24 21:04 mpodles