dpvs
dpvs copied to clipboard
pdump 使用中遇到的问题(第二次以后执行会遇到coredump问题)
系统环境
- 操作系统及内核:CentOS 7
3.10.0-514.6.2.el7.toa.2.x86_64
- DPVS 版本:master(v1.7.2)
- DPDK 版本:DPDK 17.11.2
复现过程
- 问题复现
1、启动 dpvs
2、启动 pdump:./dpdk-pdump -- --pdump 'port=1,queue=0,rx-dev=/tmp/rx.pcap' 启动正常,抓包也正常
3、关闭 pdump(ctrl + c),再次同样命令启动 pdump 程序会出现 coredump 现象。
4、如果此时重启 DPVS 程序,然后启动 pdump 就没有问题,ctrl + c 退出后再次启动就仍旧会 coredump
5、说明:已经关闭linux内存地址随机
这边没有遇到这样的问题。你那边用的代码是哪个版本的?https://github.com/mscbg/dpvs/tree/pdump 这个分支的代码你试试看有没有问题。我测试了几次没有遇到你描述的问题。
@mscbg 我使用了你上边提供的代码,结果还是会有类似的问题:
- 1、重新解压编译了dpdk-17.11.2 ,在common_base 中开启了 PCAP 开关,dpdk 环境重新初始化
- 2、dpvs 也重新编译。
首次运行遇到的问题
$ ./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
EAL: Detected 24 lcore(s)
EAL: Probing VFIO support...
EAL: Cannot initialize tailq: VFIO_RESOURCE_LIST
Tailq 0: qname:<RTE_ACL>, tqh_first:(nil), tqh_last:0x7ffff7fe041c
Tailq 1: qname:<RTE_MEMBER>, tqh_first:(nil), tqh_last:0x7ffff7fe044c
Tailq 2: qname:<RTE_EVENT_RING>, tqh_first:(nil), tqh_last:0x7ffff7fe047c
Tailq 3: qname:<RTE_REORDER>, tqh_first:(nil), tqh_last:0x7ffff7fe04ac
Tailq 4: qname:<RTE_HASH>, tqh_first:0x7fffbff33540, tqh_last:0x7fffbff35240
Tailq 5: qname:<RTE_FBK_HASH>, tqh_first:(nil), tqh_last:0x7ffff7fe050c
Tailq 6: qname:<RTE_MEMPOOL>, tqh_first:0x7fffbff350c0, tqh_last:0x7fffbff51d80
Tailq 7: qname:<RTE_RING>, tqh_first:0x7fffbff335c0, tqh_last:0x7fffbff51d00
Tailq 8: qname:<UIO_RESOURCE_LIST>, tqh_first:0x7fffbffeae00, tqh_last:0x7fffbffeae00
Tailq 9: qname:<RTE_LPM>, tqh_first:(nil), tqh_last:0x7ffff7fe05cc
Tailq 10: qname:<RTE_LPM6>, tqh_first:(nil), tqh_last:0x7ffff7fe05fc
Tailq 11: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 12: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 13: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 14: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 15: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 16: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 17: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 18: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 19: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 20: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 21: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 22: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 23: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 24: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 25: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 26: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 27: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 28: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 29: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 30: qname:<>, tqh_first:(nil), tqh_last:(nil)
Tailq 31: qname:<>, tqh_first:(nil), tqh_last:(nil)
EAL: FATAL: Cannot init tail queues for objects
EAL: Cannot init tail queues for objects
PANIC in main():
Cannot init EAL
5: [./dpdk-pdump() [0x4487ff]]
4: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ffff6cc6b35]]
3: [./dpdk-pdump(main+0x167) [0x44bc35]]
2: [./dpdk-pdump(__rte_panic+0xb8) [0x4403bb]]
1: [./dpdk-pdump(rte_dump_stack+0x1a) [0x4984ca]]
[1] 269894 abort (core dumped) ./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
然后,修改了 src/dpdk.mk 添加了几个链接库:
-lrte_acl -lrte_member -lrte_eventdev -lrte_reorder
再次执行没有问题:
./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
EAL: Detected 24 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL: probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:19:00.0 on NUMA socket 0
EAL: probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:19:00.1 on NUMA socket 0
EAL: probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL: probe driver: 8086:154d net_ixgbe
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL: probe driver: 8086:154d net_ixgbe
PMD: Initializing pmd_pcap for net_pcap_rx_0
PMD: Creating pcap-backed ethdev on numa socket -1
Port 1 MAC: 00 00 00 01 02 03
不过同样的情况,多次 ctrl + c 之后,就会出现 coredump 情况,如下:
./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
EAL: Detected 24 lcore(s)
EAL: Probing VFIO support...
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL: probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL: probe driver: 8086:1521 net_e1000_igb
EAL: PCI device 0000:19:00.0 on NUMA socket 0
EAL: probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:19:00.1 on NUMA socket 0
EAL: probe driver: 8086:1572 net_i40e
EAL: PCI device 0000:d8:00.0 on NUMA socket 1
EAL: probe driver: 8086:154d net_ixgbe
EAL: PCI device 0000:d8:00.1 on NUMA socket 1
EAL: probe driver: 8086:154d net_ixgbe
PMD: Initializing pmd_pcap for net_pcap_rx_0
PMD: Creating pcap-backed ethdev on numa socket -1
[1] 272229 segmentation fault (core dumped) ./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
我们调试发现是 17.11.2 的 pdump 修改了 共享内存内的 rte_eth_dev_data 数据结构,但退出的时候并未初始化修改的值,导致后续启动 pdump 后异常
static struct rte_eth_dev *
eth_dev_get(uint16_t port_id)
{
struct rte_eth_dev *eth_dev = &rte_eth_devices[port_id];
eth_dev->data = &rte_eth_dev_data[port_id];
eth_dev->state = RTE_ETH_DEV_ATTACHED;
TAILQ_INIT(&(eth_dev->link_intr_cbs));
eth_dev_last_created_port = port_id;
return eth_dev;
}
经测试,在 18.11.2 内已经修复了这个问题
@mscbg @liwei0526vip
@liwei0526vip 可以分享一下你的dpdk.mk文件吗?我在运行时遇到了“EAL: Failed to hotplug add device”的问题,怀疑也是没有链接上一些库。十分感谢!
@mscbg 我使用了你上边提供的代码,结果还是会有类似的问题:
- 1、重新解压编译了dpdk-17.11.2 ,在common_base 中开启了 PCAP 开关,dpdk 环境重新初始化
- 2、dpvs 也重新编译。
首次运行遇到的问题
$ ./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap' EAL: Detected 24 lcore(s) EAL: Probing VFIO support... EAL: Cannot initialize tailq: VFIO_RESOURCE_LIST Tailq 0: qname:<RTE_ACL>, tqh_first:(nil), tqh_last:0x7ffff7fe041c Tailq 1: qname:<RTE_MEMBER>, tqh_first:(nil), tqh_last:0x7ffff7fe044c Tailq 2: qname:<RTE_EVENT_RING>, tqh_first:(nil), tqh_last:0x7ffff7fe047c Tailq 3: qname:<RTE_REORDER>, tqh_first:(nil), tqh_last:0x7ffff7fe04ac Tailq 4: qname:<RTE_HASH>, tqh_first:0x7fffbff33540, tqh_last:0x7fffbff35240 Tailq 5: qname:<RTE_FBK_HASH>, tqh_first:(nil), tqh_last:0x7ffff7fe050c Tailq 6: qname:<RTE_MEMPOOL>, tqh_first:0x7fffbff350c0, tqh_last:0x7fffbff51d80 Tailq 7: qname:<RTE_RING>, tqh_first:0x7fffbff335c0, tqh_last:0x7fffbff51d00 Tailq 8: qname:<UIO_RESOURCE_LIST>, tqh_first:0x7fffbffeae00, tqh_last:0x7fffbffeae00 Tailq 9: qname:<RTE_LPM>, tqh_first:(nil), tqh_last:0x7ffff7fe05cc Tailq 10: qname:<RTE_LPM6>, tqh_first:(nil), tqh_last:0x7ffff7fe05fc Tailq 11: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 12: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 13: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 14: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 15: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 16: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 17: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 18: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 19: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 20: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 21: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 22: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 23: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 24: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 25: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 26: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 27: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 28: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 29: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 30: qname:<>, tqh_first:(nil), tqh_last:(nil) Tailq 31: qname:<>, tqh_first:(nil), tqh_last:(nil) EAL: FATAL: Cannot init tail queues for objects EAL: Cannot init tail queues for objects PANIC in main(): Cannot init EAL 5: [./dpdk-pdump() [0x4487ff]] 4: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7ffff6cc6b35]] 3: [./dpdk-pdump(main+0x167) [0x44bc35]] 2: [./dpdk-pdump(__rte_panic+0xb8) [0x4403bb]] 1: [./dpdk-pdump(rte_dump_stack+0x1a) [0x4984ca]] [1] 269894 abort (core dumped) ./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
然后,修改了 src/dpdk.mk 添加了几个链接库:
-lrte_acl -lrte_member -lrte_eventdev -lrte_reorder
再次执行没有问题:
./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap' EAL: Detected 24 lcore(s) EAL: Probing VFIO support... EAL: PCI device 0000:01:00.0 on NUMA socket 0 EAL: probe driver: 8086:1521 net_e1000_igb EAL: PCI device 0000:01:00.1 on NUMA socket 0 EAL: probe driver: 8086:1521 net_e1000_igb EAL: PCI device 0000:19:00.0 on NUMA socket 0 EAL: probe driver: 8086:1572 net_i40e EAL: PCI device 0000:19:00.1 on NUMA socket 0 EAL: probe driver: 8086:1572 net_i40e EAL: PCI device 0000:d8:00.0 on NUMA socket 1 EAL: probe driver: 8086:154d net_ixgbe EAL: PCI device 0000:d8:00.1 on NUMA socket 1 EAL: probe driver: 8086:154d net_ixgbe PMD: Initializing pmd_pcap for net_pcap_rx_0 PMD: Creating pcap-backed ethdev on numa socket -1 Port 1 MAC: 00 00 00 01 02 03
不过同样的情况,多次 ctrl + c 之后,就会出现 coredump 情况,如下:
./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap' EAL: Detected 24 lcore(s) EAL: Probing VFIO support... EAL: PCI device 0000:01:00.0 on NUMA socket 0 EAL: probe driver: 8086:1521 net_e1000_igb EAL: PCI device 0000:01:00.1 on NUMA socket 0 EAL: probe driver: 8086:1521 net_e1000_igb EAL: PCI device 0000:19:00.0 on NUMA socket 0 EAL: probe driver: 8086:1572 net_i40e EAL: PCI device 0000:19:00.1 on NUMA socket 0 EAL: probe driver: 8086:1572 net_i40e EAL: PCI device 0000:d8:00.0 on NUMA socket 1 EAL: probe driver: 8086:154d net_ixgbe EAL: PCI device 0000:d8:00.1 on NUMA socket 1 EAL: probe driver: 8086:154d net_ixgbe PMD: Initializing pmd_pcap for net_pcap_rx_0 PMD: Creating pcap-backed ethdev on numa socket -1 [1] 272229 segmentation fault (core dumped) ./dpdk-pdump -- --pdump 'port=0,queue=0,rx-dev=/tmp/rx1.pcap'
你好,我现在也遇到了同样的问题,请问是具体是怎么添加这几个链接库的呢,望解答,感谢