trex-core icon indicating copy to clipboard operation
trex-core copied to clipboard

Creating config makes running trex instance stuck

Open WithdewHua opened this issue 2 years ago • 0 comments

[Environment] trex-core version: 2.93 running trex instance's config (/etc/trex_data_sec_dlp_cfg_v4.yaml):

### Config file generated by dpdk_setup_ports.py ###

- version: 2
  interfaces: ['1b:00.0', '04:00.0']
  stack: linux_based
  prefix: data_sec_dlp_v4
  zmq_pub_port: 4500
  zmq_rpc_port: 4501
  port_info:
      - ip: 11.4.0.2
        default_gw: 11.4.0.1
      - ip: 12.6.0.2
        default_gw: 12.6.0.1

  platform:
      master_thread_id: 0
      latency_thread_id: 7
      dual_if:
        - socket: 0
          threads: [2]

[Reproduce Steps]

  1. run a trex instance: ./t-rex-64 -i --astf --cfg /etc/trex_data_sec_dlp_cfg_v4.yaml, and start traffic
  2. try to create config for another trex instance:
python3 dpdk_setup_ports.py --create 0c:00.0 13:00.0 --dest-macs 00:1c:54:ff:28:2d 00:1c:54:ff:28:35 --prefix data_sec_dlp_v6 --stack linux_based --zmq-pub-port 4600 --zmq-rpc-port 4601 -o /etc/trex_data_sec_dlp_cfg_v6.yaml
  1. and then the running instance killed by watchdog:
WATCHDOG: task 'Trex DP core 1' has not responded for more than 1.00031 seconds - timeout is 1 seconds

*** traceback follows ***

1       0x564fb43e8519 ./_t-rex-64(+0x1bf519) [0x564fb43e8519]
2       0x7f51c3df5730 /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7f51c3df5730]
3       0x564fb4701928 rte_delay_us_block + 88
4       0x564fb437a2bf CCoreEthIF::send_burst(CCorePerPort*, unsigned short, CVirtualIFPerSideStats*) + 463
5       0x564fb435477c CCoreEthIF::flush_tx_queue() + 104
6       0x564fb43cb58c CNodeGenerator::handle_maintenance(CFlowGenListPerThread*) + 252
7       0x564fb43ccbec CNodeGenerator::handle_flow_sync(CGenNode*, CFlowGenListPerThread*, bool&) + 92
8       0x564fb43cd098 CNodeGenerator::handle_slow_messages(unsigned char, CGenNode*, CFlowGenListPerThread*, bool) + 184
9       0x564fb4378e39 int CNodeGenerator::flush_file_realtime<24, false>(double, double, CFlowGenListPerThread*, double&) + 1609
10      0x564fb45e80df TrexAstfDpCore::start_scheduler() + 927
11      0x564fb452e6f9 TrexDpCore::start() + 89
12      0x564fb43c14b3 CFlowGenListPerThread::start(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, CPreviewMode&) + 115
13      0x564fb43580d9 CGlobalTRex::run_in_core(unsigned char) + 487
14      0x564fb437b99d ./_t-rex-64(+0x15299d) [0x564fb437b99d]
15      0x564fb471d2b6 eal_thread_loop + 406
16      0x7f51c3deafa3 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f51c3deafa3]
17      0x7f51c37d806f clone + 63


*** addr2line information follows ***

??:0
??:0
??:0
??:0
??:0
??:0
??:0
??:0
??:0
??:0
??:0
??:0
??:0
??:0
??:0
??:0
??:0


./t-rex-64: line 106: 14616 Aborted                 ./_$(basename $0) $INPUT_ARGS $EXTRA_INPUT_ARGS
  1. if I start the trex with --no-watchdog: ./t-rex-64 -i --astf --cfg /etc/trex_data_sec_dlp_cfg_v4.yaml --no-watchdog, after the above steps, the traffic stopped and cpu utilization increased, and then I stopped the trex instance, it shows:
-Per port stats table
      ports |               0 |               1
 -----------------------------------------------------------------------------------------
   opackets |           79614 |           75088
     obytes |       180132810 |       192476805
   ipackets |           81735 |           94369
     ibytes |       192877254 |       181135630
    ierrors |               0 |               0
    oerrors |               0 |               0
      Tx Bw |     952.12  bps |     903.86  bps

-Global stats enabled
 Cpu Utilization : 99.1  %  0.0 Gb/core
 Platform_factor : 1.0
 Total-Tx        :       1.86 Kbps
 Total-Rx        :       1.86 Kbps
 Total-PPS       :       0.10  pps
 Total-CPS       :       0.01  cps

 Expected-PPS    :       0.00  pps
 Expected-CPS    :       0.00  cps
 Expected-L7-BPS :       0.00  bps

 Active-flows    :      993  Clients :        0   Socket-util : 0.0000 %
 Open-flows      :     5896  Servers :        0   Socket :        0 Socket/Clients :  -nan
 Total_queue_full : 15601506
 drop-rate       :       0.00  bps
 current time    : 54.5 sec
 test duration   : 0.0 sec
 *** TRex is shutting down - cause: 'CTRL + C detected'
 ERROR RX core is stuck!

[Analysis]

  1. dpdk_nic_bind.show_table(get_macs=True) is called while trying to creating config:
    def do_create(self):
	  show_table = not map_driver.args.no_prompt
	  dpdk_nic_bind.show_table(True,show_table) # get the info 

get_macs is set True, and it will try to get MACs of all DPDK bound interfaces by executing ./t-rex-64 --dump-interfaces, which is the root cause I think 2. Then I tried starting a trex instance and called ./t-rex-64 --dump-interface '1b:00.0' '04:00.0' ('1b:00.0' and '04:00.0' are the running instance's interface), and it reproduced

WithdewHua avatar Dec 22 '22 03:12 WithdewHua