gatekeeper icon indicating copy to clipboard operation
gatekeeper copied to clipboard

gk: convert all batched lookups into coroutines

Open AltraMayor opened this issue 6 years ago • 6 comments

GK blocks batch lookups to the flow table and the LPM table. This pull request reverts these batched lookups back to single lookups and moves the code into coroutines.

AltraMayor avatar Nov 07 '19 10:11 AltraMayor

The code has incorporated all the feedback.

AltraMayor avatar Nov 09 '19 11:11 AltraMayor

This pull request needs to resolve the conflicts.

mengxiang0811 avatar Nov 13 '19 18:11 mengxiang0811

Below is the performance evaluation of the master branch with our updated scripts:

| lcores | table size | GK Mpps rcvd (0%) | GK Mpps rcvd (50%) | GK Mpps rcvd (99%) | GK Mpps rcvd (mean) | cli Mpps sent (0%) | cli Mpps sent (50%) | cli Mpps sent (99%) | cli Mpps sent (mean) |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  1  |  2^15  |       5.58        |        8.28        |        8.29        |        7.32         |       10.16        |        10.73        |        11.47        |         10.8         | 
|  1  |  2^20  |       6.8        |        7.58        |        7.89        |        7.48         |       9.04        |        10.29        |        11.4        |         10.34         | 
|  1  |  2^25  |       3.79        |        5.43        |        5.62        |        5.0         |       9.75        |        10.44        |        11.13        |         10.45         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  2  |  2^15  |       10.32        |        10.51        |        10.7        |        10.5         |       9.82        |        10.58        |        11.08        |         10.58         | 
|  2  |  2^20  |       9.71        |        10.19        |        10.6        |        10.15         |       9.61        |        10.44        |        11.15        |         10.45         | 
|  2  |  2^25  |       5.85        |        8.3        |        10.0        |        7.96         |       9.0        |        10.37        |        11.09        |         10.32         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  3  |  2^15  |       9.36        |        9.49        |        9.62        |        9.5         |       9.24        |        9.72        |        10.58        |         9.75         | 
|  3  |  2^20  |       8.34        |        8.79        |        9.71        |        8.84         |       8.92        |        9.74        |        10.52        |         9.77         | 
|  3  |  2^25  |       6.21        |        7.95        |        8.98        |        7.68         |       9.05        |        9.84        |        10.49        |         9.8         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|

mengxiang0811 avatar Nov 27 '19 02:11 mengxiang0811

Below is the performance evaluation after applying this pull request:

| lcores | table size | GK Mpps rcvd (0%) | GK Mpps rcvd (50%) | GK Mpps rcvd (99%) | GK Mpps rcvd (mean) | cli Mpps sent (0%) | cli Mpps sent (50%) | cli Mpps sent (99%) | cli Mpps sent (mean) |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  1  |  2^15  |       4.62        |        6.47        |        6.48        |        6.04         |       9.36        |        10.52        |        11.17        |         10.45         | 
|  1  |  2^20  |       3.73        |        4.15        |        4.35        |        4.11         |       9.61        |        10.53        |        11.12        |         10.53         | 
|  1  |  2^25  |       2.19        |        2.38        |        3.35        |        2.59         |       9.62        |        10.6        |        11.16        |         10.54         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  2  |  2^15  |       10.14        |        10.42        |        10.5        |        10.39         |       9.8        |        10.48        |        10.95        |         10.45         | 
|  2  |  2^20  |       8.57        |        9.53        |        10.11        |        9.43         |       9.72        |        10.37        |        10.96        |         10.36         | 
|  2  |  2^25  |       3.75        |        4.75        |        6.4        |        4.81         |       9.65        |        10.3        |        11.12        |         10.34         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  3  |  2^15  |       9.12        |        9.63        |        10.04        |        9.59         |       8.91        |        9.85        |        10.56        |         9.8         | 
|  3  |  2^20  |       8.27        |        8.82        |        9.17        |        8.76         |       9.14        |        9.89        |        10.7        |         9.88         | 
|  3  |  2^25  |       4.14        |        7.07        |        9.16        |        7.04         |       9.17        |        9.81        |        10.54        |         9.83         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|

mengxiang0811 avatar Nov 27 '19 03:11 mengxiang0811

Below is the CPU cycles profiling with 3 lcores + 2^25 flow entries:

Samples: 668K of event 'cycles:ppp', Event count (approx.): 8729181903458, Thread: lcore-slave-12
  Children      Self  Command         Shared Object       Symbol                                                                                                                 ◆
+    3.97%     0.00%  lcore-slave-12  gatekeeper          [.] eal_thread_loop                                                                                                    ▒
+    2.75%     1.10%  lcore-slave-12  gatekeeper          [.] gk_proc                                                                                                            ▒
+    2.57%     0.00%  lcore-slave-12  [unknown]           [.] 0x00007f38d4bfb480                                                                                                 ▒
+    2.09%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d4c0                                                                                                 ▒
+    1.95%     1.94%  lcore-slave-12  gatekeeper          [.] rte_hash_lookup_with_hash                                                                                          ▒
+    1.73%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d500                                                                                                 ▒
+    1.54%     1.54%  lcore-slave-12  gatekeeper          [.] ip_flow_cmp_eq                                                                                                     ▒
+    1.28%     1.26%  lcore-slave-12  gatekeeper          [.] rte_hash_cuckoo_make_space_mw                                                                                      ▒
+    1.19%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d540                                                                                                 ▒
+    0.99%     0.00%  lcore-slave-12  gatekeeper          [.] rte_hash_k16_cmp_eq                                                                                                ▒
     0.92%     0.90%  lcore-slave-12  gatekeeper          [.] gk_co_process_front_pkt_final                                                                                      ▒
+    0.91%     0.82%  lcore-slave-12  libc-2.27.so        [.] __memset_sse2_unaligned_erms                                                                                       ▒
     0.80%     0.79%  lcore-slave-12  gatekeeper          [.] gk_co_process_front_pkt                                                                                            ▒
+    0.75%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d580                                                                                                 ▒
+    0.70%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000002203000a                                                                                                 ▒
+    0.61%     0.00%  lcore-slave-12  [unknown]           [.] 0000000000000000                                                                                                   ▒
     0.60%     0.54%  lcore-slave-12  gatekeeper          [.] ixgbe_recv_pkts_vec                                                                                                ▒
+    0.56%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d5c0                                                                                                 ▒
+    0.53%     0.53%  lcore-slave-12  gatekeeper          [.] __rte_hash_add_key_with_hash                                                                                       ▒
     0.53%     0.53%  lcore-slave-12  libc-2.27.so        [.] __memcmp_sse4_1                                                                                                    ▒
     0.53%     0.52%  lcore-slave-12  gatekeeper          [.] gk_co_scan_flow_table                                                                                              ▒
     0.50%     0.48%  lcore-slave-12  gatekeeper          [.] gk_process_request.isra.9                                                                                          ▒
     0.50%     0.46%  lcore-slave-12  gatekeeper          [.] coro_transfer                                                                                                      ▒
     0.46%     0.00%  lcore-slave-12  [unknown]           [.] 0x00000040c0000073                                                                                                 ▒
     0.43%     0.43%  lcore-slave-12  gatekeeper          [.] gk_co_main                                                                                                         ▒
     0.42%     0.42%  lcore-slave-12  [kernel.kallsyms]   [k] nmi                                                                                                                ▒
     0.41%     0.00%  lcore-slave-12  [unknown]           [.] 0x00000000312e3030                                                                                                 ▒
     0.41%     0.00%  lcore-slave-12  [unknown]           [.] 0x0000000000000001                                                                                                 ▒
     0.39%     0.37%  lcore-slave-12  gatekeeper          [.] encapsulate                                                                                                        ▒
     0.35%     0.34%  lcore-slave-12  gatekeeper          [.] process_flow_entry                                                                                                 ▒
     0.35%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d600                                                                                                 ▒
     0.32%     0.32%  lcore-slave-12  gatekeeper          [.] rte_hash_prefetch_buckets_non_temporal                                                                             ▒
     0.29%     0.29%  lcore-slave-12  gatekeeper          [.] __rte_hash_del_key_with_hash                                                                                       ▒
     0.28%     0.00%  lcore-slave-12  [unknown]           [k] 0x000000011120d4e8                                                                                                 ▒
     0.24%     0.15%  lcore-slave-12  gatekeeper          [.] process_cmds_from_mailbox                                                                                          ▒
     0.18%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d640                                                                                                 ▒
     0.18%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d528                                                                                                 ▒
     0.15%     0.15%  lcore-slave-12  gatekeeper          [.] adjust_pkt_len                                                                                                     ▒
     0.11%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d680                                                                                                 ▒
     0.11%     0.10%  lcore-slave-12  gatekeeper          [.] pkt_copy_cached_eth_header                                                                                         ▒
     0.10%     0.07%  lcore-slave-12  gatekeeper          [.] common_ring_mp_enqueue                                                                                             ▒
     0.09%     0.09%  lcore-slave-12  gatekeeper          [.] send_pkts 

Below is the memory loads profiling with 3 lcores + 2^25 flow entries:

Samples: 2M of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 288164965, Thread: lcore-slave-12
Overhead  Command         Shared Object      Symbol                                                                                                                              ◆
   3.52%  lcore-slave-12  gatekeeper         [.] ixgbe_recv_pkts_vec                                                                                                             ▒
   3.38%  lcore-slave-12  gatekeeper         [.] rte_hash_cuckoo_make_space_mw                                                                                                   ▒
   3.16%  lcore-slave-12  gatekeeper         [.] gk_co_process_front_pkt                                                                                                         ▒
   2.93%  lcore-slave-12  gatekeeper         [.] rte_hash_prefetch_buckets_non_temporal                                                                                          ▒
   1.38%  lcore-slave-12  gatekeeper         [.] coro_transfer                                                                                                                   ▒
   1.24%  lcore-slave-12  gatekeeper         [.] gk_co_process_front_pkt_final                                                                                                   ▒
   1.16%  lcore-slave-12  gatekeeper         [.] rte_hash_lookup_with_hash                                                                                                       ▒
   1.04%  lcore-slave-12  gatekeeper         [.] gk_proc                                                                                                                         ▒
   0.99%  lcore-slave-12  libc-2.27.so       [.] __memcmp_sse4_1                                                                                                                 ▒
   0.89%  lcore-slave-12  gatekeeper         [.] __rte_hash_del_key_with_hash                                                                                                    ▒
   0.87%  lcore-slave-12  gatekeeper         [.] __rte_hash_add_key_with_hash                                                                                                    ▒
   0.85%  lcore-slave-12  gatekeeper         [.] gk_process_request.isra.9                                                                                                       ▒
   0.81%  lcore-slave-12  gatekeeper         [.] common_ring_mc_dequeue                                                                                                          ▒
   0.75%  lcore-slave-12  gatekeeper         [.] gk_co_scan_flow_table                                                                                                           ▒
   0.55%  lcore-slave-12  gatekeeper         [.] gk_solicitor_enqueue_bulk                                                                                                       ▒
   0.41%  lcore-slave-12  gatekeeper         [.] ixgbe_recv_scattered_pkts_vec                                                                                                   ▒
   0.36%  lcore-slave-12  gatekeeper         [.] process_flow_entry                                                                                                              ▒
   0.34%  lcore-slave-12  gatekeeper         [.] process_cmds_from_mailbox                                                                                                       ▒
   0.28%  lcore-slave-12  gatekeeper         [.] encapsulate                                                                                                                     ▒
   0.16%  lcore-slave-12  libc-2.27.so       [.] __memmove_sse2_unaligned_erms                                                                                                   ▒
   0.15%  lcore-slave-12  gatekeeper         [.] memcmp@plt                                                                                                                      ▒
   0.09%  lcore-slave-12  gatekeeper         [.] adjust_pkt_len                                                                                                                  ▒
   0.06%  lcore-slave-12  gatekeeper         [.] common_ring_mp_enqueue                                                                                                          ▒
   0.04%  lcore-slave-12  gatekeeper         [.] pkt_copy_cached_eth_header                                                                                                      ▒
   0.03%  lcore-slave-12  gatekeeper         [.] lpm_lookup_ipv4                                                                                                                 ▒
   0.03%  lcore-slave-12  gatekeeper         [.] gk_co_main                                                                                                                      ▒
   0.01%  lcore-slave-12  gatekeeper         [.] ip_flow_cmp_eq                                                                                                                  ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] rcu_check_callbacks                                                                                                             ▒
   0.00%  lcore-slave-12  gatekeeper         [.] gk_co_scan_flow_table_final                                                                                                     ▒
   0.00%  lcore-slave-12  gatekeeper         [.] memcpy@plt                                                                                                                      ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] account_user_time                                                                                                               ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] update_load_avg                                                                                                                 ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] hrtimer_active                                                                                                                  ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] smp_apic_timer_interrupt                                                                                                        ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] ktime_get_update_offsets_now                                                                                                    ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] update_curr                                                                                                                     ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] __acct_update_integrals                                                                                                         ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] perf_mux_hrtimer_handler                                                                                                        ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] __indirect_thunk_start                                                                                                          ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] ktime_get                                                                                                                       ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] cpuacct_account_field                                                                                                           ▒
   0.00%  lcore-slave-12  [kernel.kallsyms]  [k] rcu_irq_enter

Below is the stats with 3 lcores + 2^25 flow entries:

    1871042.695725      task-clock (msec)         #    5.764 CPUs utilized          
            63,216      context-switches          #    0.034 K/sec                  
                 7      cpu-migrations            #    0.000 K/sec                  
             2,980      page-faults               #    0.002 K/sec                  
 5,587,194,847,661      cycles                    #    2.986 GHz                      (33.33%)
 2,968,010,189,245      stalled-cycles-frontend   #   53.12% frontend cycles idle     (33.33%)
 7,681,911,254,771      instructions              #    1.37  insn per cycle         
                                                  #    0.39  stalled cycles per insn  (40.00%)
 1,052,629,516,672      branches                  #  562.590 M/sec                    (40.00%)
    14,861,588,607      branch-misses             #    1.41% of all branches          (40.00%)
 2,583,453,994,051      L1-dcache-loads           # 1380.756 M/sec                    (39.99%)
    62,738,934,810      L1-dcache-load-misses     #    2.43% of all L1-dcache hits    (13.33%)
    25,734,384,853      LLC-loads                 #   13.754 M/sec                    (13.33%)
    11,696,300,564      LLC-load-misses           #   45.45% of all LL-cache hits     (20.00%)
   <not supported>      L1-icache-loads                                             
       653,660,469      L1-icache-load-misses                                         (26.67%)
 2,583,931,932,672      dTLB-loads                # 1381.012 M/sec                    (26.67%)
    11,710,806,623      dTLB-load-misses          #    0.45% of all dTLB cache hits   (13.33%)
     1,281,604,483      iTLB-loads                #    0.685 M/sec                    (13.33%)
       133,702,153      iTLB-load-misses          #   10.43% of all iTLB cache hits   (20.00%)
   <not supported>      L1-dcache-prefetches                                        
    11,946,665,954      L1-dcache-prefetch-misses #    6.385 M/sec                    (26.67%)

     324.605504881 seconds time elapsed

Note that the above profiling results are after applying the pull request.

mengxiang0811 avatar Nov 27 '19 04:11 mengxiang0811

Below is the performance evaluation after applying the latest updated pull request:

| lcores | table size | GK Mpps rcvd (0%) | GK Mpps rcvd (50%) | GK Mpps rcvd (99%) | GK Mpps rcvd (mean) | cli Mpps sent (0%) | cli Mpps sent (50%) | cli Mpps sent (99%) | cli Mpps sent (mean) |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  1  |  2^15  |       4.73        |        4.73        |        4.82        |        4.74         |       9.19        |        10.3        |        10.89        |         10.31         | 
|  1  |  2^20  |       4.22        |        4.34        |        4.42        |        4.33         |       9.54        |        10.28        |        10.91        |         10.29         | 
|  1  |  2^25  |       1.64        |        2.6        |        2.86        |        2.54         |       8.83        |        9.54        |        10.58        |         9.65         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  2  |  2^15  |       3.98        |        7.96        |        9.08        |        7.59         |       9.32        |        10.15        |        11.04        |         10.22         | 
|  2  |  2^20  |       6.05        |        6.59        |        6.91        |        6.54         |       9.2        |        10.13        |        10.81        |         10.1         | 
|  2  |  2^25  |       3.35        |        4.51        |        4.94        |        4.4         |       9.6        |        10.16        |        10.91        |         10.22         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|
|  3  |  2^15  |       6.39        |        9.52        |        10.04        |        8.84         |       9.05        |        10.05        |        10.85        |         9.98         | 
|  3  |  2^20  |       7.0        |        8.19        |        8.71        |        8.1         |       9.21        |        10.14        |        10.78        |         10.1         | 
|  3  |  2^25  |       3.66        |        4.44        |        7.37        |        4.89         |       8.73        |        10.08        |        10.9        |         10.07         | 
|        |            |                   |                    |                    |                     |                    |                     |                     |                      |
|--------|------------|-------------------|--------------------|--------------------|---------------------|--------------------|---------------------|---------------------|----------------------|

Below is the CPU cycles profiling with 3 lcores + 2^25 flow entries:

Samples: 750K of event 'cycles:ppp', Event count (approx.): 8654593548709, Thread: lcore-slave-12
  Children      Self  Command         Shared Object       Symbol                                                                                                                 ▒
+    3.16%     0.00%  lcore-slave-12  [kernel.kallsyms]   [k] ret_from_intr                                                                                                      ◆
+    3.16%     0.00%  lcore-slave-12  [kernel.kallsyms]   [k] do_IRQ                                                                                                             ▒
+    3.13%     0.00%  lcore-slave-12  [kernel.kallsyms]   [k] irq_exit                                                                                                           ▒
+    3.13%     0.00%  lcore-slave-12  [kernel.kallsyms]   [k] __softirqentry_text_start                                                                                          ▒
+    3.12%     0.06%  lcore-slave-12  [kernel.kallsyms]   [k] net_rx_action                                                                                                      ▒
+    2.96%     0.23%  lcore-slave-12  [kernel.kallsyms]   [k] ixgbe_poll                                                                                                         ▒
+    2.88%     0.00%  lcore-slave-12  gatekeeper          [.] eal_thread_loop                                                                                                    ▒
+    2.70%     0.10%  lcore-slave-12  [kernel.kallsyms]   [k] napi_consume_skb                                                                                                   ▒
+    2.58%     0.02%  lcore-slave-12  [kernel.kallsyms]   [k] skb_release_all                                                                                                    ▒
+    2.40%     0.00%  lcore-slave-12  [unknown]           [.] 0x00007f1b9a56d3e0                                                                                                 ▒
+    2.38%     1.72%  lcore-slave-12  gatekeeper          [.] rte_hash_lookup_and_yield_with_hash                                                                                ▒
+    2.05%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d4c0                                                                                                 ▒
+    2.00%     0.72%  lcore-slave-12  gatekeeper          [.] gk_proc                                                                                                            ▒
+    1.92%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d500                                                                                                 ▒
+    1.51%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d540                                                                                                 ▒
+    1.42%     0.36%  lcore-slave-12  [kernel.kallsyms]   [k] skb_release_data                                                                                                   ▒
+    1.36%     1.01%  lcore-slave-12  gatekeeper          [.] rte_hash_cuckoo_make_space_mw                                                                                      ▒
+    1.32%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d580                                                                                                 ▒
+    1.08%     0.25%  lcore-slave-12  [kernel.kallsyms]   [k] skb_release_head_state                                                                                             ▒
+    1.07%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d5c0                                                                                                 ▒
+    1.07%     0.01%  lcore-slave-12  [kernel.kallsyms]   [k] skb_free_head                                                                                                      ▒
+    1.06%     0.29%  lcore-slave-12  [kernel.kallsyms]   [k] kfree                                                                                                              ▒
     0.97%     0.72%  lcore-slave-12  gatekeeper          [.] gk_co_process_front_pkt_final                                                                                      ▒
     0.94%     0.72%  lcore-slave-12  gatekeeper          [.] gk_co_process_front_pkt                                                                                            ▒
+    0.83%     0.00%  lcore-slave-12  [unknown]           [.] 0000000000000000                                                                                                   ▒
+    0.81%     0.07%  lcore-slave-12  [kernel.kallsyms]   [k] __slab_free                                                                                                        ▒
     0.77%     0.68%  lcore-slave-12  [kernel.kallsyms]   [k] sock_wfree                                                                                                         ▒
     0.77%     0.59%  lcore-slave-12  gatekeeper          [.] coro_transfer                                                                                                      ▒
+    0.77%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d600                                                                                                 ▒
     0.71%     0.71%  lcore-slave-12  [kernel.kallsyms]   [k] cmpxchg_double_slab.isra.61                                                                                        ▒
+    0.70%     0.56%  lcore-slave-12  gatekeeper          [.] gk_process_request.isra.10                                                                                         ▒
+    0.68%     0.52%  lcore-slave-12  libc-2.27.so        [.] __memset_sse2_unaligned_erms                                                                                       ▒
     0.62%     0.46%  lcore-slave-12  gatekeeper          [.] gk_co_scan_flow_table                                                                                              ▒
+    0.59%     0.49%  lcore-slave-12  gatekeeper          [.] ip_flow_cmp_eq                                                                                                     ▒
     0.53%     0.35%  lcore-slave-12  gatekeeper          [.] ixgbe_recv_pkts_vec                                                                                                ▒
+    0.51%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000011120d640                                                                                                 ▒
+    0.50%     0.00%  lcore-slave-12  [unknown]           [.] 0x000000012203000a                                                                                                 ▒
     0.49%     0.00%  lcore-slave-12  [unknown]           [.] 0x00000000312e3030                                                                                                 ▒
     0.49%     0.00%  lcore-slave-12  [unknown]           [.] 0x0000000000000001                                                                                                 ▒
     0.43%     0.31%  lcore-slave-12  gatekeeper          [.] encapsulate                                                                                                        ▒
     0.42%     0.32%  lcore-slave-12  gatekeeper          [.] process_flow_entry                                                                                                 ▒
     0.41%     0.00%  lcore-slave-12  [unknown]           [.] 0x00000040c0000073 

Below is the memory loads profiling with 3 lcores + 2^25 flow entries:

Samples: 2M of event 'cpu/mem-loads,ldlat=30/P', Event count (approx.): 275410232, Thread: lcore-slave-12
Overhead  Command         Shared Object      Symbol                                                                                                                              ◆
   4.71%  lcore-slave-12  gatekeeper         [.] rte_hash_cuckoo_make_space_mw                                                                                                   ▒
   2.52%  lcore-slave-12  gatekeeper         [.] ixgbe_recv_pkts_vec                                                                                                             ▒
   1.83%  lcore-slave-12  gatekeeper         [.] gk_co_process_front_pkt                                                                                                         ▒
   1.57%  lcore-slave-12  gatekeeper         [.] rte_hash_lookup_and_yield_with_hash                                                                                             ▒
   1.40%  lcore-slave-12  gatekeeper         [.] coro_transfer                                                                                                                   ▒
   1.24%  lcore-slave-12  gatekeeper         [.] prefetch_and_yield                                                                                                              ▒
   1.16%  lcore-slave-12  gatekeeper         [.] __rte_hash_add_key_with_hash                                                                                                    ▒
   0.89%  lcore-slave-12  [kernel.kallsyms]  [k] skb_release_data                                                                                                                ▒
   0.87%  lcore-slave-12  gatekeeper         [.] gk_process_request.isra.10                                                                                                      ▒
   0.83%  lcore-slave-12  [kernel.kallsyms]  [k] sock_def_write_space                                                                                                            ▒
   0.81%  lcore-slave-12  gatekeeper         [.] gk_proc                                                                                                                         ▒
   0.80%  lcore-slave-12  gatekeeper         [.] gk_co_process_front_pkt_final                                                                                                   ▒
   0.75%  lcore-slave-12  gatekeeper         [.] gk_co_scan_flow_table                                                                                                           ▒
   0.58%  lcore-slave-12  [kernel.kallsyms]  [k] sock_wfree                                                                                                                      ▒
   0.56%  lcore-slave-12  gatekeeper         [.] common_ring_mc_dequeue                                                                                                          ▒
   0.53%  lcore-slave-12  [kernel.kallsyms]  [k] skb_release_head_state                                                                                                          ▒
   0.52%  lcore-slave-12  libc-2.27.so       [.] __memcmp_sse4_1                                                                                                                 ▒
   0.46%  lcore-slave-12  [kernel.kallsyms]  [k] __slab_free                                                                                                                     ▒
   0.42%  lcore-slave-12  [kernel.kallsyms]  [k] __indirect_thunk_start                                                                                                          ▒
   0.37%  lcore-slave-12  gatekeeper         [.] __rte_hash_del_key_with_hash                                                                                                    ▒
   0.34%  lcore-slave-12  gatekeeper         [.] process_flow_entry                                                                                                              ▒
   0.32%  lcore-slave-12  [kernel.kallsyms]  [k] cmpxchg_double_slab.isra.61                                                                                                     ▒
   0.30%  lcore-slave-12  gatekeeper         [.] gk_solicitor_enqueue_bulk                                                                                                       ▒
   0.28%  lcore-slave-12  gatekeeper         [.] process_cmds_from_mailbox                                                                                                       ▒
   0.27%  lcore-slave-12  gatekeeper         [.] ixgbe_recv_scattered_pkts_vec                                                                                                   ▒
   0.26%  lcore-slave-12  [kernel.kallsyms]  [k] napi_consume_skb                                                                                                                ▒
   0.25%  lcore-slave-12  libc-2.27.so       [.] __memmove_sse2_unaligned_erms                                                                                                   ▒
   0.24%  lcore-slave-12  [kernel.kallsyms]  [k] ixgbe_poll                                                                                                                      ▒
   0.20%  lcore-slave-12  gatekeeper         [.] encapsulate                                                                                                                     ▒
   0.18%  lcore-slave-12  [kernel.kallsyms]  [k] kfree                                                                                                                           ▒
   0.07%  lcore-slave-12  gatekeeper         [.] gk_co_main                                                                                                                      ▒
   0.06%  lcore-slave-12  [kernel.kallsyms]  [k] kmem_cache_free_bulk                                                                                                            ▒
   0.06%  lcore-slave-12  gatekeeper         [.] common_ring_mp_enqueue                                                                                                          ▒
   0.06%  lcore-slave-12  gatekeeper         [.] adjust_pkt_len                                                                                                                  ▒
   0.06%  lcore-slave-12  gatekeeper         [.] pkt_copy_cached_eth_header                                                                                                      ▒
   0.05%  lcore-slave-12  [kernel.kallsyms]  [k] dql_completed                                                                                                                   ▒
   0.04%  lcore-slave-12  gatekeeper         [.] memcmp@plt                                                                                                                      ▒
   0.03%  lcore-slave-12  [kernel.kallsyms]  [k] skb_release_all                                                                                                                 ▒
   0.02%  lcore-slave-12  gatekeeper         [.] lpm_lookup_ipv4                                                                                                                 ▒
   0.02%  lcore-slave-12  [kernel.kallsyms]  [k] do_IRQ                                                                                                                          ▒
   0.02%  lcore-slave-12  [kernel.kallsyms]  [k] common_interrupt                                                                                                                ▒
   0.02%  lcore-slave-12  [kernel.kallsyms]  [k] napi_schedule_prep

Below is the stats with 3 lcores + 2^25 flow entries:


    1867274.876021      task-clock (msec)         #    5.776 CPUs utilized          
           103,235      context-switches          #    0.055 K/sec                  
                 7      cpu-migrations            #    0.000 K/sec                  
             2,974      page-faults               #    0.002 K/sec                  
 5,575,698,245,139      cycles                    #    2.986 GHz                      (33.33%)
 2,965,248,807,024      stalled-cycles-frontend   #   53.18% frontend cycles idle     (33.33%)
 7,532,386,362,044      instructions              #    1.35  insn per cycle         
                                                  #    0.39  stalled cycles per insn  (40.00%)
 1,042,765,025,951      branches                  #  558.442 M/sec                    (40.00%)
    18,869,190,878      branch-misses             #    1.81% of all branches          (40.00%)
 2,573,455,686,225      L1-dcache-loads           # 1378.188 M/sec                    (39.99%)
    69,274,863,575      L1-dcache-load-misses     #    2.69% of all L1-dcache hits    (13.33%)
    26,623,265,673      LLC-loads                 #   14.258 M/sec                    (13.34%)
    11,355,842,090      LLC-load-misses           #   42.65% of all LL-cache hits     (20.00%)
   <not supported>      L1-icache-loads                                             
     1,549,898,302      L1-icache-load-misses                                         (26.67%)
 2,573,731,016,293      dTLB-loads                # 1378.335 M/sec                    (26.67%)
    10,794,095,805      dTLB-load-misses          #    0.42% of all dTLB cache hits   (13.34%)
     1,685,548,829      iTLB-loads                #    0.903 M/sec                    (13.33%)
       559,210,349      iTLB-load-misses          #   33.18% of all iTLB cache hits   (20.00%)
   <not supported>      L1-dcache-prefetches                                        
    12,222,916,777      L1-dcache-prefetch-misses #    6.546 M/sec                    (26.67%)

     323.295308910 seconds time elapsed

mengxiang0811 avatar Dec 02 '19 21:12 mengxiang0811