MoonGen icon indicating copy to clipboard operation
MoonGen copied to clipboard

Scaling issues with multiple cores and ports

Open mdr78 opened this issue 5 years ago • 10 comments

Hi folks,

So having a problem scaling Moongen with multiple interfaces.

I am using the pktgen.lua code, with a single modification - to use a pre-defined MAC address depending on the port. I am using an Intel X710, a 4x10Gb Ethernet Interface with the 6.0.1 firmware.

When I use a single 10G port on the card, I get 14mpps no problem. When I get a second port performance significantly degrades and then continues to degrade as I add interfaces. This is not due lack of cores or queues - I am throwing plenty of those at MoonGen

The interesting bit I have modified in pktgen.lua is as follows.

-- configure tx rates and start transmit slaves
for d, dev in ipairs(args.dev) do
        for i = 1, args.threads do
                local queue = dev:getTxQueue(i - 1)
                if args.rate then
                        queue:setRate(args.rate / args.threads)
                end
                log:info("Starting Thread %d on %s sending to peer %s",
                                   ((d-1)*args.threads)+i, queue, DST_MAC[d])
                lm.startTask("txSlave", queue, DST_MAC[d])
        end
end

The DPDK Config makes lots of cores and memory on the same socket available to Moongen, as follows

DPDKConfig {
        -- configure the CPU cores to use, default: all cores
        -- cores = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10},
        cores = {29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53},

        -- max number of shared tasks running on core 0
        --sharedCores = 1,

        -- black or whitelist devices to limit which PCI devs are used by DPDK
        -- only one of the following examples can be used
        -- pciWhitelist = {"0000:d8:00.0","0000:d8:00.1","0000:d8:00.2","0000:d8:00.3"},
        pciWhitelist = {"0000:d8:00.0","0000:d8:00.1","0000:d8:00.2","0000:d8:00.3"},

        -- arbitrary DPDK command line options
        -- the following configuration allows multiple DPDK instances (use together with pciWhitelist)
        -- cf. http://dpdk.org/doc/guides/prog_guide/multi_proc_support.html#running-multiple-independent-dpdk-applications
        cli = {
                "--file-prefix", "mg1",
                "--socket-mem", "0,4096",
        }

}
	

When I run Moongen with 2 threads on 1 port, I get 14mpps no problem at all. Using htop I see that 2 cores are loaded at 100%.

./build/MoonGen examples/pktgen.lua -t 2 --dpdk-config=dpdk-conf.lua 0
[INFO]  Initializing DPDK. This will take a few seconds...
EAL: Detected 112 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.2 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.3 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e                                                                  [INFO]  Found 4 usable devices:
   Device 0: 3C:FD:FE:9D:88:F8 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 1: 3C:FD:FE:9D:88:F9 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)                  
   Device 2: 3C:FD:FE:9D:88:FA (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 3: 3C:FD:FE:9D:88:FB (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
[INFO]  Check out MoonGen (built on lm) if you are looking for a fully featured packet generator         [INFO]  https://github.com/emmericp/MoonGen                                                              [INFO]  Waiting for devices to come up...
[INFO]  Device 0 (3C:FD:FE:9D:88:F8) is up: 10000 MBit/s                                                 [INFO]  1 device is up.
[INFO]  Starting Thread 1 on [TxQueue: id=0, qid=0] sending to peer 3c:fd:fe:9d:68:b8                    [INFO]  Starting Thread 2 on [TxQueue: id=0, qid=1] sending to peer 3c:fd:fe:9d:68:b8
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 14.92 Mpps, 7637 Mbit/s (10024 Mbit/s with framing)
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 14.92 Mpps, 7637 Mbit/s (10024 Mbit/s with framing)

When I run Moongen with 2 threads each on 4 ports, performance degrades to about 3mpps per port. Using htop I see that 8 cores are loaded at 100%

[root@silpixa00396680 MoonGen]# ./build/MoonGen examples/pktgen.lua -t 2 --dpdk-config=/root/set[42/1979]
dpdk-conf.lua 0 1 2 3
[INFO]  Initializing DPDK. This will take a few seconds...
EAL: Detected 112 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.2 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.3 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
[INFO]  Found 4 usable devices:
   Device 0: 3C:FD:FE:9D:88:F8 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 1: 3C:FD:FE:9D:88:F9 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 2: 3C:FD:FE:9D:88:FA (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 3: 3C:FD:FE:9D:88:FB (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
[INFO]  Check out MoonGen (built on lm) if you are looking for a fully featured packet generator
[INFO]  https://github.com/emmericp/MoonGen
[INFO]  Waiting for devices to come up...
[INFO]  Device 3 (3C:FD:FE:9D:88:FB) is up: 10000 MBit/s
[INFO]  Device 2 (3C:FD:FE:9D:88:FA) is up: 10000 MBit/s
[INFO]  Device 1 (3C:FD:FE:9D:88:F9) is up: 10000 MBit/s
[INFO]  Device 0 (3C:FD:FE:9D:88:F8) is up: 10000 MBit/s
[INFO]  4 devices are up.
[INFO]  Starting Thread 1 on [TxQueue: id=0, qid=0] sending to peer 3c:fd:fe:9d:68:b8
[INFO]  Starting Thread 2 on [TxQueue: id=0, qid=1] sending to peer 3c:fd:fe:9d:68:b8
[INFO]  Starting Thread 3 on [TxQueue: id=1, qid=0] sending to peer 3c:fd:fe:9d:68:b9
[INFO]  Starting Thread 4 on [TxQueue: id=1, qid=1] sending to peer 3c:fd:fe:9d:68:b9
[INFO]  Starting Thread 5 on [TxQueue: id=2, qid=0] sending to peer 3c:fd:fe:9d:68:ba
[INFO]  Starting Thread 6 on [TxQueue: id=2, qid=1] sending to peer 3c:fd:fe:9d:68:ba
[INFO]  Starting Thread 7 on [TxQueue: id=3, qid=0] sending to peer 3c:fd:fe:9d:68:bb
[INFO]  Starting Thread 8 on [TxQueue: id=3, qid=1] sending to peer 3c:fd:fe:9d:68:bb
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=1] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=2] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=3] RX: 10.74 Mpps, 5498 Mbit/s (7216 Mbit/s with framing)
[Device: id=0] TX: 2.71 Mpps, 1387 Mbit/s (1821 Mbit/s with framing)
[Device: id=1] TX: 2.66 Mpps, 1362 Mbit/s (1788 Mbit/s with framing)
[Device: id=2] TX: 2.63 Mpps, 1346 Mbit/s (1766 Mbit/s with framing)
[Device: id=3] TX: 2.74 Mpps, 1404 Mbit/s (1842 Mbit/s with framing)
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=1] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=2] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=3] RX: 11.24 Mpps, 5755 Mbit/s (7554 Mbit/s with framing)
[Device: id=0] TX: 2.83 Mpps, 1448 Mbit/s (1900 Mbit/s with framing)
[Device: id=1] TX: 2.80 Mpps, 1433 Mbit/s (1881 Mbit/s with framing)
[Device: id=2] TX: 2.80 Mpps, 1433 Mbit/s (1881 Mbit/s with framing)
[Device: id=3] TX: 2.81 Mpps, 1441 Mbit/s (1891 Mbit/s with framing)

BTW: love MoonGen, it rocks.

Ray K

mdr78 avatar Nov 23 '18 10:11 mdr78

You are probably running into a hardware limit here; I don't have an X710 but my best guess for its architecture is that it's just an XL710 configured as 4x10 on one of the 40G ports or both ports in 2x10 mode.

Based on this I'd guess you should be able to achieve around 30-40 Mpps in total. Can you try a few different configurations?

  • 2 ports, 1 queue each
  • 2 ports, 2 queues each
  • 2 ports, 4 queues each
  • 4 ports, 1 queue each

For two ports: is there a difference between using ports (0 and 1 ) and (0 and 3)?

emmericp avatar Nov 23 '18 10:11 emmericp

2 ports, 1 core & 1 queue each

[root@silpixa00396680 MoonGen]# ./build/MoonGen examples/pktgen.lua -t 1 -s 10 --dpdk-config=/ro[33/1865$
XL710/dpdk-conf.lua 0 1
[INFO]  Initializing DPDK. This will take a few seconds...
EAL: Detected 112 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.2 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.3 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
[INFO]  Found 4 usable devices:
   Device 0: 3C:FD:FE:9D:88:F8 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 1: 3C:FD:FE:9D:88:F9 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 2: 3C:FD:FE:9D:88:FA (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 3: 3C:FD:FE:9D:88:FB (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
[INFO]  Check out MoonGen (built on lm) if you are looking for a fully featured packet generator
[INFO]  https://github.com/emmericp/MoonGen
[INFO]  Waiting for devices to come up...
[INFO]  Device 1 (3C:FD:FE:9D:88:F9) is up: 10000 MBit/s
[INFO]  Device 0 (3C:FD:FE:9D:88:F8) is up: 10000 MBit/s
[INFO]  2 devices are up.
[INFO]  Starting Thread 1 on [TxQueue: id=0, qid=0] sending to peer 3c:fd:fe:9d:68:b8
[INFO]  Starting Thread 2 on [TxQueue: id=1, qid=0] sending to peer 3c:fd:fe:9d:68:b9
...
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=1] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 6.58 Mpps, 3369 Mbit/s (4422 Mbit/s with framing)
[Device: id=1] TX: 6.60 Mpps, 3378 Mbit/s (4433 Mbit/s with framing)
[Device: id=0] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=1] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=0] TX: 7.36 (StdDev 0.31) Mpps, 3770 (StdDev 159) Mbit/s (4948 Mbit/s with framing), total 73917774 packets with 4730737536 bytes (incl. CRC)
[Device: id=1] TX: 7.39 (StdDev 0.31) Mpps, 3783 (StdDev 160) Mbit/s (4965 Mbit/s with framing), total 74096568 packets with 4742180352 bytes (incl. CRC)

2 ports, 2 cores and 2 queues each

[root@silpixa00396680 MoonGen]# ./build/MoonGen examples/pktgen.lua -t 2 -s 10 --dpdk-config=/ro[51/1807]
XL710/dpdk-conf.lua 0 1
[INFO]  Initializing DPDK. This will take a few seconds...
EAL: Detected 112 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.2 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.3 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
[INFO]  Found 4 usable devices:
   Device 0: 3C:FD:FE:9D:88:F8 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 1: 3C:FD:FE:9D:88:F9 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 2: 3C:FD:FE:9D:88:FA (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 3: 3C:FD:FE:9D:88:FB (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
[INFO]  Check out MoonGen (built on lm) if you are looking for a fully featured packet generator
[INFO]  https://github.com/emmericp/MoonGen
[INFO]  Waiting for devices to come up...                                                                [INFO]  Device 1 (3C:FD:FE:9D:88:F9) is up: 10000 MBit/s
[INFO]  Device 0 (3C:FD:FE:9D:88:F8) is up: 10000 MBit/s                                                 [INFO]  2 devices are up.
[INFO]  Starting Thread 1 on [TxQueue: id=0, qid=0] sending to peer 3c:fd:fe:9d:68:b8                   
[INFO]  Starting Thread 2 on [TxQueue: id=0, qid=1] sending to peer 3c:fd:fe:9d:68:b8
[INFO]  Starting Thread 3 on [TxQueue: id=1, qid=0] sending to peer 3c:fd:fe:9d:68:b9
[INFO]  Starting Thread 4 on [TxQueue: id=1, qid=1] sending to peer 3c:fd:fe:9d:68:b9
...
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=1] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 5.00 Mpps, 2558 Mbit/s (3357 Mbit/s with framing)
[Device: id=1] TX: 5.02 Mpps, 2570 Mbit/s (3373 Mbit/s with framing)
[Device: id=0] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=1] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=0] TX: 5.65 (StdDev 0.25) Mpps, 2892 (StdDev 126) Mbit/s (3796 Mbit/s with framing), total 56593404 packets with 3621977856 bytes (incl. CRC)
[Device: id=1] TX: 5.67 (StdDev 0.24) Mpps, 2904 (StdDev 125) Mbit/s (3811 Mbit/s with framing), total 56993454 packets with 3647581056 bytes (incl. CRC)

2 ports, 4 cores and 4 queues each

[root@silpixa00396680 MoonGen]# ./build/MoonGen examples/pktgen.lua -t 4 -s 10 --dpdk-config=/ro[55/1963$
XL710/dpdk-conf.lua 0 1
[INFO]  Initializing DPDK. This will take a few seconds...
EAL: Detected 112 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.2 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.3 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
[INFO]  Found 4 usable devices:
   Device 0: 3C:FD:FE:9D:88:F8 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 1: 3C:FD:FE:9D:88:F9 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 2: 3C:FD:FE:9D:88:FA (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 3: 3C:FD:FE:9D:88:FB (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
[INFO]  Check out MoonGen (built on lm) if you are looking for a fully featured packet generator
[INFO]  https://github.com/emmericp/MoonGen
[INFO]  Waiting for devices to come up...
[INFO]  Device 1 (3C:FD:FE:9D:88:F9) is up: 10000 MBit/s
[INFO]  Device 0 (3C:FD:FE:9D:88:F8) is up: 10000 MBit/s
[INFO]  2 devices are up.
[INFO]  Starting Thread 1 on [TxQueue: id=0, qid=0] sending to peer 3c:fd:fe:9d:68:b8                    
[INFO]  Starting Thread 2 on [TxQueue: id=0, qid=1] sending to peer 3c:fd:fe:9d:68:b8
[INFO]  Starting Thread 3 on [TxQueue: id=0, qid=2] sending to peer 3c:fd:fe:9d:68:b8
[INFO]  Starting Thread 4 on [TxQueue: id=0, qid=3] sending to peer 3c:fd:fe:9d:68:b8
[INFO]  Starting Thread 5 on [TxQueue: id=1, qid=0] sending to peer 3c:fd:fe:9d:68:b9
[INFO]  Starting Thread 6 on [TxQueue: id=1, qid=1] sending to peer 3c:fd:fe:9d:68:b9
[INFO]  Starting Thread 7 on [TxQueue: id=1, qid=2] sending to peer 3c:fd:fe:9d:68:b9
[INFO]  Starting Thread 8 on [TxQueue: id=1, qid=3] sending to peer 3c:fd:fe:9d:68:b9
...
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=1] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)                                           [Device: id=0] TX: 4.72 Mpps, 2419 Mbit/s (3175 Mbit/s with framing)
[Device: id=1] TX: 4.73 Mpps, 2420 Mbit/s (3176 Mbit/s with framing)
[Device: id=0] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=1] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=0] TX: 5.51 (StdDev 0.29) Mpps, 2821 (StdDev 151) Mbit/s (3703 Mbit/s with framing), total 55276074 packets with 3537668736 bytes (incl. CRC)
[Device: id=1] TX: 5.51 (StdDev 0.29) Mpps, 2822 (StdDev 151) Mbit/s (3704 Mbit/s with framing), total 55352682 packets with 3542571648 bytes (incl. CRC)

4 ports, 1 cores and 1 queues each

[root@silpixa00396680 MoonGen]# ./build/MoonGen examples/pktgen.lua -t 1 -s 10 --dpdk-config=/ro[77/1865$
XL710/dpdk-conf.lua 0 1 2 3
[INFO]  Initializing DPDK. This will take a few seconds...
EAL: Detected 112 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.2 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.3 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
[INFO]  Found 4 usable devices:
   Device 0: 3C:FD:FE:9D:88:F8 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 1: 3C:FD:FE:9D:88:F9 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 2: 3C:FD:FE:9D:88:FA (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 3: 3C:FD:FE:9D:88:FB (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
[INFO]  Check out MoonGen (built on lm) if you are looking for a fully featured packet generator
[INFO]  https://github.com/emmericp/MoonGen
[INFO]  Waiting for devices to come up...
[INFO]  Device 3 (3C:FD:FE:9D:88:FB) is up: 10000 MBit/s
[INFO]  Device 2 (3C:FD:FE:9D:88:FA) is up: 10000 MBit/s
[INFO]  Device 1 (3C:FD:FE:9D:88:F9) is up: 10000 MBit/s
[INFO]  Device 0 (3C:FD:FE:9D:88:F8) is up: 10000 MBit/s
[INFO]  4 devices are up.
...
[Device: id=0] TX: 2.46 Mpps, 1259 Mbit/s (1652 Mbit/s with framing)
[Device: id=1] TX: 2.44 Mpps, 1249 Mbit/s (1640 Mbit/s with framing)
[Device: id=2] TX: 2.44 Mpps, 1248 Mbit/s (1637 Mbit/s with framing)
[Device: id=3] TX: 2.48 Mpps, 1269 Mbit/s (1666 Mbit/s with framing)
[Device: id=0] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=1] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=2] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=3] RX: 11.09 (StdDev 0.48) Mpps, 5678 (StdDev 245) Mbit/s (7452 Mbit/s with framing), total 111287647 packets with 7122409408 bytes (incl. CRC)
[Device: id=0] TX: 2.78 (StdDev 0.12) Mpps, 1423 (StdDev 61) Mbit/s (1867 Mbit/s with framing), total 27923805 packets with 1787123520 bytes (incl. CRC)
[Device: id=1] TX: 2.75 (StdDev 0.12) Mpps, 1410 (StdDev 60) Mbit/s (1851 Mbit/s with framing), total 27631674 packets with 1768427136 bytes (incl. CRC)
[Device: id=2] TX: 2.76 (StdDev 0.12) Mpps, 1411 (StdDev 62) Mbit/s (1853 Mbit/s with framing), total 27613404 packets with 1767257856 bytes (incl. CRC)
[Device: id=3] TX: 2.80 (StdDev 0.12) Mpps, 1434 (StdDev 62) Mbit/s (1882 Mbit/s with framing), total 28124019 packets with 1799937216 bytes (incl. CRC)

mdr78 avatar Nov 23 '18 14:11 mdr78

2 ports (0 and 3), 1 core & 1 queue each

[root@silpixa00396680 MoonGen]# ./build/MoonGen examples/pktgen.lua -t 1 -s 10 --dpdk-config=/ro[49/1963]
XL710/dpdk-conf.lua 0 3
[INFO]  Initializing DPDK. This will take a few seconds...
EAL: Detected 112 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.2 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.3 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
[INFO]  Found 4 usable devices:
   Device 0: 3C:FD:FE:9D:88:F8 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 1: 3C:FD:FE:9D:88:F9 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 2: 3C:FD:FE:9D:88:FA (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 3: 3C:FD:FE:9D:88:FB (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
[INFO]  Check out MoonGen (built on lm) if you are looking for a fully featured packet generator         [INFO]  https://github.com/emmericp/MoonGen
[INFO]  Waiting for devices to come up...                                                                [INFO]  Device 3 (3C:FD:FE:9D:88:FB) is up: 10000 MBit/s
[INFO]  Device 0 (3C:FD:FE:9D:88:F8) is up: 10000 MBit/s                                                 [INFO]  2 devices are up.
[INFO]  Starting Thread 1 on [TxQueue: id=0, qid=0] sending to peer 3c:fd:fe:9d:68:b8                    [INFO]  Starting Thread 2 on [TxQueue: id=3, qid=0] sending to peer 3c:fd:fe:9d:68:b9
...
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=3] RX: 6.53 Mpps, 3346 Mbit/s (4391 Mbit/s with framing)
[Device: id=0] TX: 6.53 Mpps, 3344 Mbit/s (4390 Mbit/s with framing)
[Device: id=3] TX: 5.49 Mpps, 2810 Mbit/s (3688 Mbit/s with framing)
[Device: id=0] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=3] RX: 7.31 (StdDev 0.29) Mpps, 3745 (StdDev 150) Mbit/s (4916 Mbit/s with framing), total 73540278 packets with 4706577792 bytes (incl. CRC)
[Device: id=0] TX: 7.31 (StdDev 0.29) Mpps, 3745 (StdDev 151) Mbit/s (4915 Mbit/s with framing), total 73540278 packets with 4706577792 bytes (incl. CRC)
[Device: id=3] TX: 6.12 (StdDev 0.24) Mpps, 3135 (StdDev 122) Mbit/s (4115 Mbit/s with framing), total 61248663 packets with 3919914432 bytes (incl. CRC)

mdr78 avatar Nov 23 '18 14:11 mdr78

Can you post the output of lspci -vvv -s 0000:d8:00.0

emmericp avatar Nov 23 '18 20:11 emmericp

Another thing to test would be using multiple processes; there's an example in dpdk-conf.lua, just assign different cores and a different whitelist to each process.

Note that this should show the exact same behavior: tasks in MoonGen are more independent than your usual thread as they run in a completely different LuaJIT VM. It's still worth testing.

emmericp avatar Nov 23 '18 21:11 emmericp

lspci -vvv -s 0000:d8:00.0

[root@xxx ~]# lspci -vvv -s 0000:d8:00.0
d8:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
        Subsystem: Intel Corporation Ethernet Converged Network Adapter X710-4
        Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 1024
        NUMA node: 1
        Region 0: Memory at f0000000 (64-bit, prefetchable) [size=8M]
        Region 3: Memory at f1018000 (64-bit, prefetchable) [size=32K]
        Expansion ROM at f1400000 [disabled] [size=512K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] MSI-X: Enable- Count=129 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00001000
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <16us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [e0] Vital Product Data
                Product Name: XL710 40GbE Controller
                Read-only fields:
                        [PN] Part number:
                        [EC] Engineering changes:
                        [FG] Unknown:
                        [LC] Unknown:
                        [MN] Manufacture ID:
                        [PG] Unknown:
                        [SN] Serial number:
                        [V0] Vendor specific:
                        [RV] Reserved: checksum good, 0 byte(s) reserved
                Read/write fields:
                        [V1] Vendor specific:
                End
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [140 v1] Device Serial Number f8-88-9d-ff-ff-fe-fd-3c
        Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
                IOVCap: Migration-, Interrupt Message Number: 000
                IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
                IOVSta: Migration-
                Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 00
                VF offset: 16, stride: 1, Device ID: 154c
                Supported Page Size: 00000553, System Page Size: 00000001
                Region 0: Memory at 00000000f0e00000 (64-bit, prefetchable)
                Region 3: Memory at 00000000f11a0000 (64-bit, prefetchable)
                VF Migration: offset: 00000000, BIR: 0
        Capabilities: [1a0 v1] Transaction Processing Hints
                Device specific mode supported
                No steering table available
        Capabilities: [1b0 v1] Access Control Services
                ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
                ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
        Capabilities: [1d0 v1] #19
        Kernel driver in use: igb_uio
        Kernel modules: i40e

[root@xxx ~]#

mdr78 avatar Nov 26 '18 09:11 mdr78

When running in separate processes is better.

When I run a separate instances of Moongen for each pair of ports (four ports total, in two processes), Each process instance achieves the same performance as a if I was only running with a single process on two ports. So there is some sort of contention that is being eliminated.

See below - I run Configuration 1 and Configuration 2 simultaneously, which shows that it is not a problem with the Ethernet Controller.

**4 ports, 2 process, 1 port and core each **

[root@xxxx MoonGen]# egrep -H 'cores|Whitelist' foo/*
foo/dpdk-conf1.lua:     cores = {29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39},
foo/dpdk-conf1.lua:     pciWhitelist = {"0000:d8:00.0","0000:d8:00.1"},
foo/dpdk-conf2.lua:     cores = {40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53},
foo/dpdk-conf2.lua:     pciWhitelist = {"0000:d8:00.2","0000:d8:00.3"},

Configuration 1 [root@xxxx MoonGen]# ./build/MoonGen examples/pktgen.lua -t 1 -s 10 --dpdk-config=foo/dpdk-conf1.lua [INFO] Initializing DPDK. This will take a few seconds... EAL: Detected 112 lcore(s) EAL: Probing VFIO support... EAL: PCI device 0000:d8:00.0 on NUMA socket 1 EAL: probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.1 on NUMA socket 1 EAL: probe driver: 8086:1572 net_i40e
[INFO] Found 2 usable devices: Device 0: 3C:FD:FE:9D:88:F8 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
Device 1: 3C:FD:FE:9D:88:F9 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+) [INFO] Check out MoonGen (built on lm) if you are looking for a fully featured packet generator [INFO] https://github.com/emmericp/MoonGen [INFO] Waiting for devices to come up... [INFO] Device 1 (3C:FD:FE:9D:88:F9) is up: 10000 MBit/s [INFO] Device 0 (3C:FD:FE:9D:88:F8) is up: 10000 MBit/s [INFO] 2 devices are up. [INFO] Starting Thread 1 on [TxQueue: id=0, qid=0] sending to peer 3c:fd:fe:9d:68:b8 [INFO] Starting Thread 2 on [TxQueue: id=1, qid=0] sending to peer 3c:fd:fe:9d:68:b9 [Device: id=0] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 1 packets with 114 byte s (incl. CRC) [Device: id=1] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 1 packets with 114 byte s (incl. CRC) [Device: id=0] TX: 8.12 (StdDev 0.47) Mpps, 4155 (StdDev 238) Mbit/s (5454 Mbit/s with framing), total 81119367 packet s with 5191639488 bytes (incl. CRC) [Device: id=1] TX: 8.15 (StdDev 0.44) Mpps, 4173 (StdDev 224) Mbit/s (5477 Mbit/s with framing), total 81521811 packet s with 5217395904 bytes (incl. CRC)

Configuration 2
[root@xxxx MoonGen]# ./build/MoonGen examples/pktgen.lua -t 1 -s 10 --dpdk-config=foo/dpdk-conf2.lua 0 1 [INFO] Initializing DPDK. This will take a few seconds... EAL: Detected 112 lcore(s) EAL: Probing VFIO support... EAL: PCI device 0000:d8:00.2 on NUMA socket 1 EAL: probe driver: 8086:1572 net_i40e EAL: PCI device 0000:d8:00.3 on NUMA socket 1 EAL: probe driver: 8086:1572 net_i40e [INFO] Found 2 usable devices: Device 0: 3C:FD:FE:9D:88:FA (Intel Corporation Ethernet Controller X710 for 10GbE SFP+) Device 1: 3C:FD:FE:9D:88:FB (Intel Corporation Ethernet Controller X710 for 10GbE SFP+) [INFO] Check out MoonGen (built on lm) if you are looking for a fully featured packet generator [INFO] https://github.com/emmericp/MoonGen [INFO] Waiting for devices to come up... [INFO] Device 1 (3C:FD:FE:9D:88:FB) is up: 10000 MBit/s [INFO] Device 0 (3C:FD:FE:9D:88:FA) is up: 10000 MBit/s [INFO] 2 devices are up. [INFO] Starting Thread 1 on [TxQueue: id=0, qid=0] sending to peer 3c:fd:fe:9d:68:b8 [INFO] Starting Thread 2 on [TxQueue: id=1, qid=0] sending to peer 3c:fd:fe:9d:68:b9 [Device: id=0] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 1 packets with 114 byte s (incl. CRC) [Device: id=1] RX: 8.45 (StdDev 6.97) Mpps, 4325 (StdDev 3569) Mbit/s (5676 Mbit/s with framing), total 94306605 packe ts with 6035595058 bytes (incl. CRC) [Device: id=0] TX: 8.10 (StdDev 0.48) Mpps, 4148 (StdDev 247) Mbit/s (5444 Mbit/s with framing), total 80952165 packet s with 5180938560 bytes (incl. CRC) [Device: id=1] TX: 8.12 (StdDev 0.50) Mpps, 4156 (StdDev 255) Mbit/s (5455 Mbit/s with framing), total 81016803 packet s with 5185075392 bytes (incl. CRC)

mdr78 avatar Nov 26 '18 15:11 mdr78

Doesn't appear to be related to the statistics thread. When I run with four ports however restrict the statistics to be calculated on only one port, performance is much the same.

[INFO]  Initializing DPDK. This will take a few seconds...
EAL: Detected 112 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.2 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.3 on NUMA socket 1
EAL:   probe driver: 8086:1572 net_i40e
[INFO]  Found 4 usable devices:
   Device 0: 3C:FD:FE:9D:88:F8 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 1: 3C:FD:FE:9D:88:F9 (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 2: 3C:FD:FE:9D:88:FA (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
   Device 3: 3C:FD:FE:9D:88:FB (Intel Corporation Ethernet Controller X710 for 10GbE SFP+)
[INFO]  Check out MoonGen (built on lm) if you are looking for a fully featured packet generator
[INFO]  https://github.com/emmericp/MoonGen
[INFO]  Waiting for devices to come up...
[INFO]  Device 3 (3C:FD:FE:9D:88:FB) is up: 10000 MBit/s
[INFO]  Device 2 (3C:FD:FE:9D:88:FA) is up: 10000 MBit/s
[INFO]  Device 1 (3C:FD:FE:9D:88:F9) is up: 10000 MBit/s                                                              [INFO]  Device 0 (3C:FD:FE:9D:88:F8) is up: 10000 MBit/s
[INFO]  4 devices are up.                                                                                             [INFO]  Starting Thread 1 on [TxQueue: id=0, qid=0] sending to peer 3c:fd:fe:9d:68:b8
[INFO]  Starting Thread 2 on [TxQueue: id=1, qid=0] sending to peer 3c:fd:fe:9d:68:b9
[INFO]  Starting Thread 3 on [TxQueue: id=2, qid=0] sending to peer 3c:fd:fe:9d:68:ba
[INFO]  Starting Thread 4 on [TxQueue: id=3, qid=0] sending to peer 3c:fd:fe:9d:68:bb

[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 2.71 Mpps, 1390 Mbit/s (1824 Mbit/s with framing)
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 2.86 Mpps, 1466 Mbit/s (1924 Mbit/s with framing)
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 2.87 Mpps, 1467 Mbit/s (1926 Mbit/s with framing)
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 2.86 Mpps, 1464 Mbit/s (1921 Mbit/s with framing)
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 2.86 Mpps, 1464 Mbit/s (1922 Mbit/s with framing)
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 2.86 Mpps, 1463 Mbit/s (1921 Mbit/s with framing)
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 2.87 Mpps, 1469 Mbit/s (1928 Mbit/s with framing)
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 2.86 Mpps, 1464 Mbit/s (1921 Mbit/s with framing)
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 2.86 Mpps, 1464 Mbit/s (1922 Mbit/s with framing)
[Device: id=0] RX: 0.00 Mpps, 0 Mbit/s (0 Mbit/s with framing)
[Device: id=0] TX: 2.50 Mpps, 1279 Mbit/s (1679 Mbit/s with framing)
[Device: id=0] RX: 0.00 (StdDev 0.00) Mpps, 0 (StdDev 0) Mbit/s (0 Mbit/s with framing), total 0 packets with 0 bytes (incl. CRC)
[Device: id=0] TX: 2.82 (StdDev 0.12) Mpps, 1445 (StdDev 62) Mbit/s (1896 Mbit/s with framing), total 28288134 packets with 1810440576 bytes (incl. CRC)

mdr78 avatar Nov 26 '18 17:11 mdr78

@mdr78 Can you tell me why the RX stats are 0.0? I have trouble getting RX stats. Any idea? This is an old issue. Is that expected behavior thought?

vsag96 avatar Jun 07 '20 21:06 vsag96

this is unlikely to be related; the main reason for not getting stats is using a virtual NIC that drops packets earlier on the hypervisor, in this case the you'll have to actually receive and drop the packets in order for the stats to work

emmericp avatar Jun 08 '20 07:06 emmericp