MoonGen icon indicating copy to clipboard operation
MoonGen copied to clipboard

replay-pcap.lua: Using intervals from file segfaults

Open a3f opened this issue 8 years ago • 5 comments
trafficstars

This is a continuation of #192 and uses the same pcap, but with a 82580 NIC instead.

Running

sudo build/MoonGen examples/pcap/replay-pcap.lua -r 1 -l 3 ea200usec.pcap

sometimes segfaults. Most times it just doesn't send out any traffic as with the I210 in #192.

[INFO]  Initializing DPDK. This will take a few seconds...
EAL: Detected 8 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:00:19.0 on NUMA socket 0
EAL:   probe driver: 8086:153a net_e1000_em
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1533 net_e1000_igb
EAL: PCI device 0000:02:00.0 on NUMA socket 0
EAL:   probe driver: 8086:150e net_e1000_igb
EAL: PCI device 0000:02:00.1 on NUMA socket 0
EAL:   probe driver: 8086:150e net_e1000_igb
EAL: PCI device 0000:02:00.2 on NUMA socket 0
EAL:   probe driver: 8086:150e net_e1000_igb
EAL: PCI device 0000:02:00.3 on NUMA socket 0
EAL:   probe driver: 8086:150e net_e1000_igb
EAL: PCI device 0000:03:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:03:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:03:00.2 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:03:00.3 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
[INFO]  Found 6 usable devices:
   Device 0: A0:36:9F:62:86:4F (Intel Corporation I210 Gigabit Network Connection)
   Device 1: 00:1B:21:A6:D1:BC (Intel Corporation 82580 Gigabit Network Connection)
   Device 2: 00:1B:21:A6:D1:BD (Intel Corporation 82580 Gigabit Network Connection)
   Device 3: 00:1B:21:A6:D1:BE (Intel Corporation 82580 Gigabit Network Connection)
   Device 4: 00:1B:21:A6:D1:BF (Intel Corporation 82580 Gigabit Network Connection)
   Device 5: A0:36:9F:A3:87:F6 (Intel Corporation I350 Gigabit Network Connection)
[INFO]  Waiting for devices to come up...
[INFO]  Device 3 (00:1B:21:A6:D1:BE) is up: 1000 MBit/s
[INFO]  1 device is up.
[Device: id=3] TX: 0.00 Mpps, 3 Mbit/s (3 Mbit/s with framing)
[Device: id=3] TX: 0.00 Mpps, 3 Mbit/s (3 Mbit/s with framing)
[Device: id=3] TX: 0.00 Mpps, 3 Mbit/s (3 Mbit/s with framing)
[Device: id=3] TX: 0.00 Mpps, 3 Mbit/s (3 Mbit/s with framing)
[Device: id=3] TX: 0.00 Mpps, 3 Mbit/s (3 Mbit/s with framing)
[Device: id=3] TX: 0.00 Mpps, 3 Mbit/s (3 Mbit/s with framing)
Segmentation fault                                                                    

The core's stack trace is bogus:

(gdb) bt
#0  0x000055d88e599dd3 in ?? ()
#1  0x404f800000000000 in ?? ()
#2  0x402a000000000000 in ?? ()
#3  0x000055d87b1446c0 in ?? ()
#4  0x000000000000001a in ?? ()
#5  0x0000000000000001 in ?? ()
#6  0x4029724840739378 in ?? ()
#7  0x0000000000000000 in ?? ()

Not sure how useful, but here's the stack:

(gdb) x/32ga $rsp
0x7f98307fcb00: 0x404f800000000000      0x402a000000000000
0x7f98307fcb10: 0x563a4a6d86c0 <lcore_config>   0x1a
0x7f98307fcb20: 0x1     0x41e415c841c73378
0x7f98307fcb30: 0x0     0xc
0x7f98307fcb40: 0x563a4a6d8860 <lcore_config+416>       0x7f98307fcc1f
0x7f98307fcb50: 0x7f98307fcbf0  0x563a4a12ee41 <lua_pcall+177>
0x7f98307fcb60: 0x40636ed0      0x40943340
0x7f98307fcb70: 0x200000000     0x41c73378
0x7f98307fcb80: 0x40943340      0x41c73378
0x7f98307fcb90: 0x40636ed0      0x40943340
0x7f98307fcba0: 0x41c733b8      0x0
0x7f98307fcbb0: 0x7f98307fcbf0  0x563a4a0f3eab <libmoon::lua_core_main(void*)+166>
0x7f98307fcbc0: 0x0     0x563a4bfbbf10
0x7f98307fcbd0: 0x563a4bfbbf10  0x41c73378
0x7f98307fcbe0: 0x1a    0x7f98307fcc1f
0x7f98307fcbf0: 0xf     0x563a4a202d1d <eal_thread_loop+477>

This doesn't happen when the -r 1 is dropped.

a3f avatar Nov 07 '17 14:11 a3f

can you please post your full hardware configuration and the linux distribution that you are using?

emmericp avatar Nov 07 '17 14:11 emmericp

Debian GNU/Linux 9.2 (stretch), running kernel 4.9.30-rt20 (PREEMPT RT patch). Running on an Intel(R) Xeon(R) CPU E5-1620 v3. Test was with an Intel 82580 which was connected over a TAP to the other port of the same 82580.

What do you mean with full hardware configuration? lshw(1) output?

a3f avatar Nov 07 '17 14:11 a3f

good/bad news: I've found a system where this is reproducible, will investigate. looks like the ring interface between the threads is doing something bad

emmericp avatar Nov 09 '17 12:11 emmericp

Hello, any news on this? Is there something I can do to help? (besides fixing the bug, I don't know enough about DPDK to do that :-)

a3f avatar Dec 06 '17 13:12 a3f

Hi all, same problem here. Running Intel Corporation 82599ES 10-Gigabit SFI/SFP+ on a DELL9020 i7-4790, 32G RAM. DPDK and Moongen setup all configured and setup and works as expected. Using swapped out kernel drivers as per DPDK Kernel driver in use: igb_uio Kernel modules: ixgbe Running NAME="Ubuntu" VERSION="20.04.2 LTS (Focal Fossa)" Linux dpdk-injector 5.8.0-59-generic #66~20.04.1-Ubuntu SMP Thu Jun 17 11:14:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

coco21 avatar Jul 01 '21 12:07 coco21