LINUX性能调优
perf stat -e LLC-loads,LLC-load-misses -a -I 1000
root@dev01:~# perf list
List of pre-defined events (to be used in -e):
alignment-faults [Software event] bpf-output [Software event] context-switches OR cs [Software event] cpu-clock [Software event] cpu-migrations OR migrations [Software event] dummy [Software event] emulation-faults [Software event] major-faults [Software event] minor-faults [Software event] page-faults OR faults [Software event] task-clock [Software event]
duration_time [Tool event]
L1-dcache-load-misses [Hardware cache event] L1-dcache-loads [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-icache-load-misses [Hardware cache event] branch-load-misses [Hardware cache event] branch-loads [Hardware cache event] dTLB-load-misses [Hardware cache event] dTLB-loads [Hardware cache event] dTLB-store-misses [Hardware cache event] dTLB-stores [Hardware cache event] iTLB-load-misses [Hardware cache event] iTLB-loads [Hardware cache event]
msr/pperf/ [Kernel PMU event] msr/smi/ [Kernel PMU event] msr/tsc/ [Kernel PMU event] ref-cycles OR cpu/ref-cycles/ [Kernel PMU event] topdown-fetch-bubbles OR cpu/topdown-fetch-bubbles/ [Kernel PMU event] topdown-recovery-bubbles OR cpu/topdown-recovery-bubbles/ [Kernel PMU eve> topdown-slots-issued OR cpu/topdown-slots-issued/ [Kernel PMU event] topdown-slots-retired OR cpu/topdown-slots-retired/ [Kernel PMU event] topdown-total-slots OR cpu/topdown-total-slots/ [Kernel PMU event]
cache:
l1d.replacement
[L1D data line replacements]
l1d_pend_miss.fb_full
[Number of times a request needed a FB entry but there was no entry
available for it. That is the FB unavailability was dominant
reason for blocking the request. A request includes
cacheable/uncacheable demands that is load, store or SW prefetch]
l1d_pend_miss.pending
[L1D miss outstandings duration in cycles]
l1d_pend_miss.pending_cycles
[Cycles with L1D load Misses outstanding]
l1d_pend_miss.pending_cycles_any
[Cycles with L1D load Misses outstanding from any thread on
physical core]
l2_lines_in.all
[L2 cache lines filling L2]
l2_lines_out.non_silent
[Counts the number of lines that are evicted by L2 cache when
triggered by an L2 cache fill. Those lines are in Modified state.
Modified lines are written back to L3]
l2_lines_out.silent
[Counts the number of lines that are silently dropped by L2 cache
when triggered by an L2 cache fill. These lines are typically in
Shared or Exclusive state. A non-threaded event]
l2_lines_out.useless_hwpf
[Counts the number of lines that have been hardware prefetched but
not used and now evicted by L2 cache]
l2_lines_out.useless_pref
[This event is deprecated. Refer to new event
L2_LINES_OUT.USELESS_HWPF]
l2_rqsts.all_code_rd
[L2 code requests]
l2_rqsts.all_demand_data_rd
[Demand Data Read requests]
l2_rqsts.all_demand_miss
[Demand requests that miss L2 cache]
l2_rqsts.all_demand_references