gk: convert all batched lookups into coroutines
GK blocks batch lookups to the flow table and the LPM table. This pull request reverts these batched lookups back to single lookups and moves the code into coroutines.
The code has incorporated all the feedback.
This pull request needs to resolve the conflicts.
Below is the performance evaluation of the master branch with our updated scripts:
| lcores | table size | GK Mpps rcvd (0%) | GK Mpps rcvd (50%) | GK Mpps rcvd (99%) | GK Mpps rcvd (mean) | cli Mpps sent (0%) | cli Mpps sent (50%) | cli Mpps sent (99%) | cli Mpps sent (mean) |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
| 1 | 2^15 | 5.58 | 8.28 | 8.29 | 7.32 | 10.16 | 10.73 | 11.47 | 10.8 |
| 1 | 2^20 | 6.8 | 7.58 | 7.89 | 7.48 | 9.04 | 10.29 | 11.4 | 10.34 |
| 1 | 2^25 | 3.79 | 5.43 | 5.62 | 5.0 | 9.75 | 10.44 | 11.13 | 10.45 |
| | | | | | | | | | |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
| 2 | 2^15 | 10.32 | 10.51 | 10.7 | 10.5 | 9.82 | 10.58 | 11.08 | 10.58 |
| 2 | 2^20 | 9.71 | 10.19 | 10.6 | 10.15 | 9.61 | 10.44 | 11.15 | 10.45 |
| 2 | 2^25 | 5.85 | 8.3 | 10.0 | 7.96 | 9.0 | 10.37 | 11.09 | 10.32 |
| | | | | | | | | | |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
| 3 | 2^15 | 9.36 | 9.49 | 9.62 | 9.5 | 9.24 | 9.72 | 10.58 | 9.75 |
| 3 | 2^20 | 8.34 | 8.79 | 9.71 | 8.84 | 8.92 | 9.74 | 10.52 | 9.77 |
| 3 | 2^25 | 6.21 | 7.95 | 8.98 | 7.68 | 9.05 | 9.84 | 10.49 | 9.8 |
| | | | | | | | | | |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
Below is the performance evaluation after applying this pull request:
| lcores | table size | GK Mpps rcvd (0%) | GK Mpps rcvd (50%) | GK Mpps rcvd (99%) | GK Mpps rcvd (mean) | cli Mpps sent (0%) | cli Mpps sent (50%) | cli Mpps sent (99%) | cli Mpps sent (mean) |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
| 1 | 2^15 | 4.62 | 6.47 | 6.48 | 6.04 | 9.36 | 10.52 | 11.17 | 10.45 |
| 1 | 2^20 | 3.73 | 4.15 | 4.35 | 4.11 | 9.61 | 10.53 | 11.12 | 10.53 |
| 1 | 2^25 | 2.19 | 2.38 | 3.35 | 2.59 | 9.62 | 10.6 | 11.16 | 10.54 |
| | | | | | | | | | |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
| 2 | 2^15 | 10.14 | 10.42 | 10.5 | 10.39 | 9.8 | 10.48 | 10.95 | 10.45 |
| 2 | 2^20 | 8.57 | 9.53 | 10.11 | 9.43 | 9.72 | 10.37 | 10.96 | 10.36 |
| 2 | 2^25 | 3.75 | 4.75 | 6.4 | 4.81 | 9.65 | 10.3 | 11.12 | 10.34 |
| | | | | | | | | | |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
| 3 | 2^15 | 9.12 | 9.63 | 10.04 | 9.59 | 8.91 | 9.85 | 10.56 | 9.8 |
| 3 | 2^20 | 8.27 | 8.82 | 9.17 | 8.76 | 9.14 | 9.89 | 10.7 | 9.88 |
| 3 | 2^25 | 4.14 | 7.07 | 9.16 | 7.04 | 9.17 | 9.81 | 10.54 | 9.83 |
| | | | | | | | | | |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
Below is the CPU cycles profiling with 3 lcores + 2^25 flow entries:
Samples: 668K of event 'cycles:ppp', Event count (approx.): 8729181903458, Thread: lcore-slave-12
Children Self Command Shared Object Symbol ◆
+ 3.97% 0.00% lcore-slave-12 gatekeeper [.] eal_thread_loop ▒
+ 2.75% 1.10% lcore-slave-12 gatekeeper [.] gk_proc ▒
+ 2.57% 0.00% lcore-slave-12 [unknown] [.] 0x00007f38d4bfb480 ▒
+ 2.09% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d4c0 ▒
+ 1.95% 1.94% lcore-slave-12 gatekeeper [.] rte_hash_lookup_with_hash ▒
+ 1.73% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d500 ▒
+ 1.54% 1.54% lcore-slave-12 gatekeeper [.] ip_flow_cmp_eq ▒
+ 1.28% 1.26% lcore-slave-12 gatekeeper [.] rte_hash_cuckoo_make_space_mw ▒
+ 1.19% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d540 ▒
+ 0.99% 0.00% lcore-slave-12 gatekeeper [.] rte_hash_k16_cmp_eq ▒
0.92% 0.90% lcore-slave-12 gatekeeper [.] gk_co_process_front_pkt_final ▒
+ 0.91% 0.82% lcore-slave-12 libc-2.27.so [.] __memset_sse2_unaligned_erms ▒
0.80% 0.79% lcore-slave-12 gatekeeper [.] gk_co_process_front_pkt ▒
+ 0.75% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d580 ▒
+ 0.70% 0.00% lcore-slave-12 [unknown] [.] 0x000000002203000a ▒
+ 0.61% 0.00% lcore-slave-12 [unknown] [.] 0000000000000000 ▒
0.60% 0.54% lcore-slave-12 gatekeeper [.] ixgbe_recv_pkts_vec ▒
+ 0.56% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d5c0 ▒
+ 0.53% 0.53% lcore-slave-12 gatekeeper [.] __rte_hash_add_key_with_hash ▒
0.53% 0.53% lcore-slave-12 libc-2.27.so [.] __memcmp_sse4_1 ▒
0.53% 0.52% lcore-slave-12 gatekeeper [.] gk_co_scan_flow_table ▒
0.50% 0.48% lcore-slave-12 gatekeeper [.] gk_process_request.isra.9 ▒
0.50% 0.46% lcore-slave-12 gatekeeper [.] coro_transfer ▒
0.46% 0.00% lcore-slave-12 [unknown] [.] 0x00000040c0000073 ▒
0.43% 0.43% lcore-slave-12 gatekeeper [.] gk_co_main ▒
0.42% 0.42% lcore-slave-12 [kernel.kallsyms] [k] nmi ▒
0.41% 0.00% lcore-slave-12 [unknown] [.] 0x00000000312e3030 ▒
0.41% 0.00% lcore-slave-12 [unknown] [.] 0x0000000000000001 ▒
0.39% 0.37% lcore-slave-12 gatekeeper [.] encapsulate ▒
0.35% 0.34% lcore-slave-12 gatekeeper [.] process_flow_entry ▒
0.35% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d600 ▒
0.32% 0.32% lcore-slave-12 gatekeeper [.] rte_hash_prefetch_buckets_non_temporal ▒
0.29% 0.29% lcore-slave-12 gatekeeper [.] __rte_hash_del_key_with_hash ▒
0.28% 0.00% lcore-slave-12 [unknown] [k] 0x000000011120d4e8 ▒
0.24% 0.15% lcore-slave-12 gatekeeper [.] process_cmds_from_mailbox ▒
0.18% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d640 ▒
0.18% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d528 ▒
0.15% 0.15% lcore-slave-12 gatekeeper [.] adjust_pkt_len ▒
0.11% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d680 ▒
0.11% 0.10% lcore-slave-12 gatekeeper [.] pkt_copy_cached_eth_header ▒
0.10% 0.07% lcore-slave-12 gatekeeper [.] common_ring_mp_enqueue ▒
0.09% 0.09% lcore-slave-12 gatekeeper [.] send_pkts
Below is the memory loads profiling with 3 lcores + 2^25 flow entries:
Samples: 2M of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 288164965, Thread: lcore-slave-12
Overhead Command Shared Object Symbol ◆
3.52% lcore-slave-12 gatekeeper [.] ixgbe_recv_pkts_vec ▒
3.38% lcore-slave-12 gatekeeper [.] rte_hash_cuckoo_make_space_mw ▒
3.16% lcore-slave-12 gatekeeper [.] gk_co_process_front_pkt ▒
2.93% lcore-slave-12 gatekeeper [.] rte_hash_prefetch_buckets_non_temporal ▒
1.38% lcore-slave-12 gatekeeper [.] coro_transfer ▒
1.24% lcore-slave-12 gatekeeper [.] gk_co_process_front_pkt_final ▒
1.16% lcore-slave-12 gatekeeper [.] rte_hash_lookup_with_hash ▒
1.04% lcore-slave-12 gatekeeper [.] gk_proc ▒
0.99% lcore-slave-12 libc-2.27.so [.] __memcmp_sse4_1 ▒
0.89% lcore-slave-12 gatekeeper [.] __rte_hash_del_key_with_hash ▒
0.87% lcore-slave-12 gatekeeper [.] __rte_hash_add_key_with_hash ▒
0.85% lcore-slave-12 gatekeeper [.] gk_process_request.isra.9 ▒
0.81% lcore-slave-12 gatekeeper [.] common_ring_mc_dequeue ▒
0.75% lcore-slave-12 gatekeeper [.] gk_co_scan_flow_table ▒
0.55% lcore-slave-12 gatekeeper [.] gk_solicitor_enqueue_bulk ▒
0.41% lcore-slave-12 gatekeeper [.] ixgbe_recv_scattered_pkts_vec ▒
0.36% lcore-slave-12 gatekeeper [.] process_flow_entry ▒
0.34% lcore-slave-12 gatekeeper [.] process_cmds_from_mailbox ▒
0.28% lcore-slave-12 gatekeeper [.] encapsulate ▒
0.16% lcore-slave-12 libc-2.27.so [.] __memmove_sse2_unaligned_erms ▒
0.15% lcore-slave-12 gatekeeper [.] memcmp@plt ▒
0.09% lcore-slave-12 gatekeeper [.] adjust_pkt_len ▒
0.06% lcore-slave-12 gatekeeper [.] common_ring_mp_enqueue ▒
0.04% lcore-slave-12 gatekeeper [.] pkt_copy_cached_eth_header ▒
0.03% lcore-slave-12 gatekeeper [.] lpm_lookup_ipv4 ▒
0.03% lcore-slave-12 gatekeeper [.] gk_co_main ▒
0.01% lcore-slave-12 gatekeeper [.] ip_flow_cmp_eq ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] rcu_check_callbacks ▒
0.00% lcore-slave-12 gatekeeper [.] gk_co_scan_flow_table_final ▒
0.00% lcore-slave-12 gatekeeper [.] memcpy@plt ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] account_user_time ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] update_load_avg ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] hrtimer_active ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] smp_apic_timer_interrupt ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] ktime_get_update_offsets_now ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] update_curr ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] __acct_update_integrals ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] perf_mux_hrtimer_handler ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] __indirect_thunk_start ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] ktime_get ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] cpuacct_account_field ▒
0.00% lcore-slave-12 [kernel.kallsyms] [k] rcu_irq_enter
Below is the stats with 3 lcores + 2^25 flow entries:
1871042.695725 task-clock (msec) # 5.764 CPUs utilized
63,216 context-switches # 0.034 K/sec
7 cpu-migrations # 0.000 K/sec
2,980 page-faults # 0.002 K/sec
5,587,194,847,661 cycles # 2.986 GHz (33.33%)
2,968,010,189,245 stalled-cycles-frontend # 53.12% frontend cycles idle (33.33%)
7,681,911,254,771 instructions # 1.37 insn per cycle
# 0.39 stalled cycles per insn (40.00%)
1,052,629,516,672 branches # 562.590 M/sec (40.00%)
14,861,588,607 branch-misses # 1.41% of all branches (40.00%)
2,583,453,994,051 L1-dcache-loads # 1380.756 M/sec (39.99%)
62,738,934,810 L1-dcache-load-misses # 2.43% of all L1-dcache hits (13.33%)
25,734,384,853 LLC-loads # 13.754 M/sec (13.33%)
11,696,300,564 LLC-load-misses # 45.45% of all LL-cache hits (20.00%)
<not supported> L1-icache-loads
653,660,469 L1-icache-load-misses (26.67%)
2,583,931,932,672 dTLB-loads # 1381.012 M/sec (26.67%)
11,710,806,623 dTLB-load-misses # 0.45% of all dTLB cache hits (13.33%)
1,281,604,483 iTLB-loads # 0.685 M/sec (13.33%)
133,702,153 iTLB-load-misses # 10.43% of all iTLB cache hits (20.00%)
<not supported> L1-dcache-prefetches
11,946,665,954 L1-dcache-prefetch-misses # 6.385 M/sec (26.67%)
324.605504881 seconds time elapsed
Note that the above profiling results are after applying the pull request.
Below is the performance evaluation after applying the latest updated pull request:
| lcores | table size | GK Mpps rcvd (0%) | GK Mpps rcvd (50%) | GK Mpps rcvd (99%) | GK Mpps rcvd (mean) | cli Mpps sent (0%) | cli Mpps sent (50%) | cli Mpps sent (99%) | cli Mpps sent (mean) |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
| 1 | 2^15 | 4.73 | 4.73 | 4.82 | 4.74 | 9.19 | 10.3 | 10.89 | 10.31 |
| 1 | 2^20 | 4.22 | 4.34 | 4.42 | 4.33 | 9.54 | 10.28 | 10.91 | 10.29 |
| 1 | 2^25 | 1.64 | 2.6 | 2.86 | 2.54 | 8.83 | 9.54 | 10.58 | 9.65 |
| | | | | | | | | | |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
| 2 | 2^15 | 3.98 | 7.96 | 9.08 | 7.59 | 9.32 | 10.15 | 11.04 | 10.22 |
| 2 | 2^20 | 6.05 | 6.59 | 6.91 | 6.54 | 9.2 | 10.13 | 10.81 | 10.1 |
| 2 | 2^25 | 3.35 | 4.51 | 4.94 | 4.4 | 9.6 | 10.16 | 10.91 | 10.22 |
| | | | | | | | | | |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
| 3 | 2^15 | 6.39 | 9.52 | 10.04 | 8.84 | 9.05 | 10.05 | 10.85 | 9.98 |
| 3 | 2^20 | 7.0 | 8.19 | 8.71 | 8.1 | 9.21 | 10.14 | 10.78 | 10.1 |
| 3 | 2^25 | 3.66 | 4.44 | 7.37 | 4.89 | 8.73 | 10.08 | 10.9 | 10.07 |
| | | | | | | | | | |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
Below is the CPU cycles profiling with 3 lcores + 2^25 flow entries:
Samples: 750K of event 'cycles:ppp', Event count (approx.): 8654593548709, Thread: lcore-slave-12
Children Self Command Shared Object Symbol ▒
+ 3.16% 0.00% lcore-slave-12 [kernel.kallsyms] [k] ret_from_intr ◆
+ 3.16% 0.00% lcore-slave-12 [kernel.kallsyms] [k] do_IRQ ▒
+ 3.13% 0.00% lcore-slave-12 [kernel.kallsyms] [k] irq_exit ▒
+ 3.13% 0.00% lcore-slave-12 [kernel.kallsyms] [k] __softirqentry_text_start ▒
+ 3.12% 0.06% lcore-slave-12 [kernel.kallsyms] [k] net_rx_action ▒
+ 2.96% 0.23% lcore-slave-12 [kernel.kallsyms] [k] ixgbe_poll ▒
+ 2.88% 0.00% lcore-slave-12 gatekeeper [.] eal_thread_loop ▒
+ 2.70% 0.10% lcore-slave-12 [kernel.kallsyms] [k] napi_consume_skb ▒
+ 2.58% 0.02% lcore-slave-12 [kernel.kallsyms] [k] skb_release_all ▒
+ 2.40% 0.00% lcore-slave-12 [unknown] [.] 0x00007f1b9a56d3e0 ▒
+ 2.38% 1.72% lcore-slave-12 gatekeeper [.] rte_hash_lookup_and_yield_with_hash ▒
+ 2.05% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d4c0 ▒
+ 2.00% 0.72% lcore-slave-12 gatekeeper [.] gk_proc ▒
+ 1.92% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d500 ▒
+ 1.51% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d540 ▒
+ 1.42% 0.36% lcore-slave-12 [kernel.kallsyms] [k] skb_release_data ▒
+ 1.36% 1.01% lcore-slave-12 gatekeeper [.] rte_hash_cuckoo_make_space_mw ▒
+ 1.32% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d580 ▒
+ 1.08% 0.25% lcore-slave-12 [kernel.kallsyms] [k] skb_release_head_state ▒
+ 1.07% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d5c0 ▒
+ 1.07% 0.01% lcore-slave-12 [kernel.kallsyms] [k] skb_free_head ▒
+ 1.06% 0.29% lcore-slave-12 [kernel.kallsyms] [k] kfree ▒
0.97% 0.72% lcore-slave-12 gatekeeper [.] gk_co_process_front_pkt_final ▒
0.94% 0.72% lcore-slave-12 gatekeeper [.] gk_co_process_front_pkt ▒
+ 0.83% 0.00% lcore-slave-12 [unknown] [.] 0000000000000000 ▒
+ 0.81% 0.07% lcore-slave-12 [kernel.kallsyms] [k] __slab_free ▒
0.77% 0.68% lcore-slave-12 [kernel.kallsyms] [k] sock_wfree ▒
0.77% 0.59% lcore-slave-12 gatekeeper [.] coro_transfer ▒
+ 0.77% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d600 ▒
0.71% 0.71% lcore-slave-12 [kernel.kallsyms] [k] cmpxchg_double_slab.isra.61 ▒
+ 0.70% 0.56% lcore-slave-12 gatekeeper [.] gk_process_request.isra.10 ▒
+ 0.68% 0.52% lcore-slave-12 libc-2.27.so [.] __memset_sse2_unaligned_erms ▒
0.62% 0.46% lcore-slave-12 gatekeeper [.] gk_co_scan_flow_table ▒
+ 0.59% 0.49% lcore-slave-12 gatekeeper [.] ip_flow_cmp_eq ▒
0.53% 0.35% lcore-slave-12 gatekeeper [.] ixgbe_recv_pkts_vec ▒
+ 0.51% 0.00% lcore-slave-12 [unknown] [.] 0x000000011120d640 ▒
+ 0.50% 0.00% lcore-slave-12 [unknown] [.] 0x000000012203000a ▒
0.49% 0.00% lcore-slave-12 [unknown] [.] 0x00000000312e3030 ▒
0.49% 0.00% lcore-slave-12 [unknown] [.] 0x0000000000000001 ▒
0.43% 0.31% lcore-slave-12 gatekeeper [.] encapsulate ▒
0.42% 0.32% lcore-slave-12 gatekeeper [.] process_flow_entry ▒
0.41% 0.00% lcore-slave-12 [unknown] [.] 0x00000040c0000073
Below is the memory loads profiling with 3 lcores + 2^25 flow entries:
Samples: 2M of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 275410232, Thread: lcore-slave-12
Overhead Command Shared Object Symbol ◆
4.71% lcore-slave-12 gatekeeper [.] rte_hash_cuckoo_make_space_mw ▒
2.52% lcore-slave-12 gatekeeper [.] ixgbe_recv_pkts_vec ▒
1.83% lcore-slave-12 gatekeeper [.] gk_co_process_front_pkt ▒
1.57% lcore-slave-12 gatekeeper [.] rte_hash_lookup_and_yield_with_hash ▒
1.40% lcore-slave-12 gatekeeper [.] coro_transfer ▒
1.24% lcore-slave-12 gatekeeper [.] prefetch_and_yield ▒
1.16% lcore-slave-12 gatekeeper [.] __rte_hash_add_key_with_hash ▒
0.89% lcore-slave-12 [kernel.kallsyms] [k] skb_release_data ▒
0.87% lcore-slave-12 gatekeeper [.] gk_process_request.isra.10 ▒
0.83% lcore-slave-12 [kernel.kallsyms] [k] sock_def_write_space ▒
0.81% lcore-slave-12 gatekeeper [.] gk_proc ▒
0.80% lcore-slave-12 gatekeeper [.] gk_co_process_front_pkt_final ▒
0.75% lcore-slave-12 gatekeeper [.] gk_co_scan_flow_table ▒
0.58% lcore-slave-12 [kernel.kallsyms] [k] sock_wfree ▒
0.56% lcore-slave-12 gatekeeper [.] common_ring_mc_dequeue ▒
0.53% lcore-slave-12 [kernel.kallsyms] [k] skb_release_head_state ▒
0.52% lcore-slave-12 libc-2.27.so [.] __memcmp_sse4_1 ▒
0.46% lcore-slave-12 [kernel.kallsyms] [k] __slab_free ▒
0.42% lcore-slave-12 [kernel.kallsyms] [k] __indirect_thunk_start ▒
0.37% lcore-slave-12 gatekeeper [.] __rte_hash_del_key_with_hash ▒
0.34% lcore-slave-12 gatekeeper [.] process_flow_entry ▒
0.32% lcore-slave-12 [kernel.kallsyms] [k] cmpxchg_double_slab.isra.61 ▒
0.30% lcore-slave-12 gatekeeper [.] gk_solicitor_enqueue_bulk ▒
0.28% lcore-slave-12 gatekeeper [.] process_cmds_from_mailbox ▒
0.27% lcore-slave-12 gatekeeper [.] ixgbe_recv_scattered_pkts_vec ▒
0.26% lcore-slave-12 [kernel.kallsyms] [k] napi_consume_skb ▒
0.25% lcore-slave-12 libc-2.27.so [.] __memmove_sse2_unaligned_erms ▒
0.24% lcore-slave-12 [kernel.kallsyms] [k] ixgbe_poll ▒
0.20% lcore-slave-12 gatekeeper [.] encapsulate ▒
0.18% lcore-slave-12 [kernel.kallsyms] [k] kfree ▒
0.07% lcore-slave-12 gatekeeper [.] gk_co_main ▒
0.06% lcore-slave-12 [kernel.kallsyms] [k] kmem_cache_free_bulk ▒
0.06% lcore-slave-12 gatekeeper [.] common_ring_mp_enqueue ▒
0.06% lcore-slave-12 gatekeeper [.] adjust_pkt_len ▒
0.06% lcore-slave-12 gatekeeper [.] pkt_copy_cached_eth_header ▒
0.05% lcore-slave-12 [kernel.kallsyms] [k] dql_completed ▒
0.04% lcore-slave-12 gatekeeper [.] memcmp@plt ▒
0.03% lcore-slave-12 [kernel.kallsyms] [k] skb_release_all ▒
0.02% lcore-slave-12 gatekeeper [.] lpm_lookup_ipv4 ▒
0.02% lcore-slave-12 [kernel.kallsyms] [k] do_IRQ ▒
0.02% lcore-slave-12 [kernel.kallsyms] [k] common_interrupt ▒
0.02% lcore-slave-12 [kernel.kallsyms] [k] napi_schedule_prep
Below is the stats with 3 lcores + 2^25 flow entries:
1867274.876021 task-clock (msec) # 5.776 CPUs utilized
103,235 context-switches # 0.055 K/sec
7 cpu-migrations # 0.000 K/sec
2,974 page-faults # 0.002 K/sec
5,575,698,245,139 cycles # 2.986 GHz (33.33%)
2,965,248,807,024 stalled-cycles-frontend # 53.18% frontend cycles idle (33.33%)
7,532,386,362,044 instructions # 1.35 insn per cycle
# 0.39 stalled cycles per insn (40.00%)
1,042,765,025,951 branches # 558.442 M/sec (40.00%)
18,869,190,878 branch-misses # 1.81% of all branches (40.00%)
2,573,455,686,225 L1-dcache-loads # 1378.188 M/sec (39.99%)
69,274,863,575 L1-dcache-load-misses # 2.69% of all L1-dcache hits (13.33%)
26,623,265,673 LLC-loads # 14.258 M/sec (13.34%)
11,355,842,090 LLC-load-misses # 42.65% of all LL-cache hits (20.00%)
<not supported> L1-icache-loads
1,549,898,302 L1-icache-load-misses (26.67%)
2,573,731,016,293 dTLB-loads # 1378.335 M/sec (26.67%)
10,794,095,805 dTLB-load-misses # 0.42% of all dTLB cache hits (13.34%)
1,685,548,829 iTLB-loads # 0.903 M/sec (13.33%)
559,210,349 iTLB-load-misses # 33.18% of all iTLB cache hits (20.00%)
<not supported> L1-dcache-prefetches
12,222,916,777 L1-dcache-prefetch-misses # 6.546 M/sec (26.67%)
323.295308910 seconds time elapsed