dpvs icon indicating copy to clipboard operation
dpvs copied to clipboard

pdump 使用中遇到的问题(第二次以后执行会遇到coredump问题)

Open liwei0526vip opened this issue 5 years ago • 5 comments

系统环境

  • 操作系统及内核:CentOS 7 3.10.0-514.6.2.el7.toa.2.x86_64
  • DPVS 版本:master(v1.7.2)
  • DPDK 版本:DPDK 17.11.2

复现过程

  • 问题复现
1、启动 dpvs
2、启动 pdump:./dpdk-pdump  -- --pdump 'port=1,queue=0,rx-dev=/tmp/rx.pcap'  启动正常,抓包也正常
3、关闭 pdump(ctrl + c),再次同样命令启动 pdump 程序会出现 coredump 现象。
4、如果此时重启 DPVS 程序,然后启动 pdump 就没有问题,ctrl + c 退出后再次启动就仍旧会 coredump 
5、说明:已经关闭linux内存地址随机

liwei0526vip avatar Sep 18 '19 11:09 liwei0526vip

这边没有遇到这样的问题。你那边用的代码是哪个版本的?https://github.com/mscbg/dpvs/tree/pdump 这个分支的代码你试试看有没有问题。我测试了几次没有遇到你描述的问题。

mscbg avatar Sep 19 '19 08:09 mscbg

@mscbg 我使用了你上边提供的代码,结果还是会有类似的问题:

  • 1、重新解压编译了dpdk-17.11.2 ,在common_base 中开启了 PCAP 开关,dpdk 环境重新初始化
  • 2、dpvs 也重新编译。

首次运行遇到的问题

$ ./dpdk-pdump  -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
EAL: Detected 24 lcore(s)
EAL: Probing VFIO support...
EAL: Cannot initialize tailq: VFIO_RESOURCE_LIST
Tailq 0: qname:<RTE_ACL>, tqh_first:(nil), tqh_last:0x7ffff7fe041c
Tailq 1: qname:<RTE_MEMBER>, tqh_first:(nil), tqh_last:0x7ffff7fe044c
Tailq 2: qname:<RTE_EVENT_RING>, tqh_first:(nil), tqh_last:0x7ffff7fe047c
Tailq 3: qname:<RTE_REORDER>, tqh_first:(nil), tqh_last:0x7ffff7fe04ac
Tailq 4: qname:<RTE_HASH>, tqh_first:0x7fffbff33540, tqh_last:0x7fffbff35240
Tailq 5: qname:<RTE_FBK_HASH>, tqh_first:(nil), tqh_last:0x7ffff7fe050c
Tailq 6: qname:<RTE_MEMPOOL>, tqh_first:0x7fffbff350c0, tqh_last:0x7fffbff51d80
Tailq 7: qname:<RTE_RING>, tqh_first:0x7fffbff335c0, tqh_last:0x7fffbff51d00
Tailq 8: qname:<UIO_RESOURCE_LIST>, tqh_first:0x7fffbffeae00, tqh_last:0x7fffbffeae00
Tailq 9: qname:<RTE_LPM>, tqh_first:(nil), tqh_last:0x7ffff7fe05cc
Tailq 10: qname:<RTE_LPM6>, tqh_first:(nil), tqh_last:0x7ffff7fe05fc
Tailq 11: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 12: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 13: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 14: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 15: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 16: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 17: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 18: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 19: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 20: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 21: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 22: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 23: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 24: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 25: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 26: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 27: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 28: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 29: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 30: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 31: qname:<>, tqh_first:(nil), tqh_last:(nil)
EAL: FATAL: Cannot init tail queues for objects

EAL: Cannot init tail queues for objects

PANIC in main():
Cannot init EAL
5: [./dpdk-pdump() [0x4487ff]]
4: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ffff6cc6b35]]
3: [./dpdk-pdump(main+0x167) [0x44bc35]]
2: [./dpdk-pdump(__rte_panic+0xb8) [0x4403bb]]
1: [./dpdk-pdump(rte_dump_stack+0x1a) [0x4984ca]]
[1]    269894 abort (core dumped)  ./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'

然后,修改了 src/dpdk.mk 添加了几个链接库:

-lrte_acl -lrte_member -lrte_eventdev -lrte_reorder

再次执行没有问题:

./dpdk-pdump  -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
EAL: Detected 24 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:19:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:19:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:154d net_ixgbe
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:154d net_ixgbe
PMD: Initializing pmd_pcap for net_pcap_rx_0
PMD: Creating pcap-backed ethdev on numa socket -1
Port 1 MAC: 00 00 00 01 02 03

不过同样的情况,多次 ctrl + c 之后,就会出现 coredump 情况,如下:

./dpdk-pdump  -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
EAL: Detected 24 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:19:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:19:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:154d net_ixgbe
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:154d net_ixgbe
PMD: Initializing pmd_pcap for net_pcap_rx_0
PMD: Creating pcap-backed ethdev on numa socket -1
[1]    272229 segmentation fault (core dumped)  ./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'

liwei0526vip avatar Sep 20 '19 00:09 liwei0526vip

我们调试发现是 17.11.2 的 pdump 修改了 共享内存内的 rte_eth_dev_data 数据结构,但退出的时候并未初始化修改的值,导致后续启动 pdump 后异常

static struct rte_eth_dev *
eth_dev_get(uint16_t port_id)
{
	struct rte_eth_dev *eth_dev = &rte_eth_devices[port_id];

	eth_dev->data = &rte_eth_dev_data[port_id];
	eth_dev->state = RTE_ETH_DEV_ATTACHED;
	TAILQ_INIT(&(eth_dev->link_intr_cbs));

	eth_dev_last_created_port = port_id;

	return eth_dev;
}

经测试,在 18.11.2 内已经修复了这个问题

@mscbg @liwei0526vip

chengzhycn avatar Sep 20 '19 08:09 chengzhycn

@liwei0526vip 可以分享一下你的dpdk.mk文件吗?我在运行时遇到了“EAL: Failed to hotplug add device”的问题,怀疑也是没有链接上一些库。十分感谢!

JorgeZhu0 avatar Nov 15 '19 05:11 JorgeZhu0

@mscbg 我使用了你上边提供的代码,结果还是会有类似的问题:

  • 1、重新解压编译了dpdk-17.11.2 ,在common_base 中开启了 PCAP 开关,dpdk 环境重新初始化
  • 2、dpvs 也重新编译。

首次运行遇到的问题

$ ./dpdk-pdump  -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
EAL: Detected 24 lcore(s)
EAL: Probing VFIO support...
EAL: Cannot initialize tailq: VFIO_RESOURCE_LIST
Tailq 0: qname:<RTE_ACL>, tqh_first:(nil), tqh_last:0x7ffff7fe041c
Tailq 1: qname:<RTE_MEMBER>, tqh_first:(nil), tqh_last:0x7ffff7fe044c
Tailq 2: qname:<RTE_EVENT_RING>, tqh_first:(nil), tqh_last:0x7ffff7fe047c
Tailq 3: qname:<RTE_REORDER>, tqh_first:(nil), tqh_last:0x7ffff7fe04ac
Tailq 4: qname:<RTE_HASH>, tqh_first:0x7fffbff33540, tqh_last:0x7fffbff35240
Tailq 5: qname:<RTE_FBK_HASH>, tqh_first:(nil), tqh_last:0x7ffff7fe050c
Tailq 6: qname:<RTE_MEMPOOL>, tqh_first:0x7fffbff350c0, tqh_last:0x7fffbff51d80
Tailq 7: qname:<RTE_RING>, tqh_first:0x7fffbff335c0, tqh_last:0x7fffbff51d00
Tailq 8: qname:<UIO_RESOURCE_LIST>, tqh_first:0x7fffbffeae00, tqh_last:0x7fffbffeae00
Tailq 9: qname:<RTE_LPM>, tqh_first:(nil), tqh_last:0x7ffff7fe05cc
Tailq 10: qname:<RTE_LPM6>, tqh_first:(nil), tqh_last:0x7ffff7fe05fc
Tailq 11: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 12: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 13: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 14: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 15: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 16: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 17: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 18: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 19: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 20: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 21: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 22: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 23: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 24: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 25: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 26: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 27: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 28: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 29: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 30: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 31: qname:<>, tqh_first:(nil), tqh_last:(nil)
EAL: FATAL: Cannot init tail queues for objects

EAL: Cannot init tail queues for objects

PANIC in main():
Cannot init EAL
5: [./dpdk-pdump() [0x4487ff]]
4: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ffff6cc6b35]]
3: [./dpdk-pdump(main+0x167) [0x44bc35]]
2: [./dpdk-pdump(__rte_panic+0xb8) [0x4403bb]]
1: [./dpdk-pdump(rte_dump_stack+0x1a) [0x4984ca]]
[1]    269894 abort (core dumped)  ./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'

然后,修改了 src/dpdk.mk 添加了几个链接库:

-lrte_acl -lrte_member -lrte_eventdev -lrte_reorder

再次执行没有问题:

./dpdk-pdump  -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
EAL: Detected 24 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:19:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:19:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:154d net_ixgbe
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:154d net_ixgbe
PMD: Initializing pmd_pcap for net_pcap_rx_0
PMD: Creating pcap-backed ethdev on numa socket -1
Port 1 MAC: 00 00 00 01 02 03

不过同样的情况,多次 ctrl + c 之后,就会出现 coredump 情况,如下:

./dpdk-pdump  -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
EAL: Detected 24 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:19:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:19:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL:   probe driver: 8086:154d net_ixgbe
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL:   probe driver: 8086:154d net_ixgbe
PMD: Initializing pmd_pcap for net_pcap_rx_0
PMD: Creating pcap-backed ethdev on numa socket -1
[1]    272229 segmentation fault (core dumped)  ./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'

你好,我现在也遇到了同样的问题,请问是具体是怎么添加这几个链接库的呢,望解答,感谢

trailll avatar Nov 15 '23 11:11 trailll