likwid
likwid copied to clipboard
`MEM1` and `MEM2` are both zero on AMD 9654
I am trying to measure memory bandwidth for a stencil application that runs on both sockets of a two socket AMD 9654 system.
I am getting zero as the memory bandwidth as seen below. Is there an issue with DFC counters on zen4 architecture? Is it fully supported? I tried with and without -f.
for metric in MEM2 MEM1;do export OMP_NUM_THREADS=192; srun --nodes=1 --cpus-per-task=192 --threads-per-core=1 -t 1-0:00 --hint=nomultithread likwid-perfctr -f -C 0-191 -g ${metric} ./a.out 512 512 512 201 2504 ;done
INFO: You are running LIKWID in a cpuset with 192 CPUs. Taking given IDs as logical ID in cpuset
--------------------------------------------------------------------------------
CPU name: AMD EPYC 9654 96-Core Processor
CPU type: AMD K19 (Zen4) architecture
CPU clock: 2.40 GHz
--------------------------------------------------------------------------------
+---------------------------+---------+---------------+------------+-------------+--------------+
| Event | Counter | Sum | Min | Max | Avg |
+---------------------------+---------+---------------+------------+-------------+--------------+
| ACTUAL_CPU_CLOCK STAT | FIXC1 | 1998528125004 | 6754696363 | 14015654916 | 1.040900e+10 |
| MAX_CPU_CLOCK STAT | FIXC2 | 1297140105696 | 4381439136 | 9099460176 | 6.755938e+09 |
| RETIRED_INSTRUCTIONS STAT | PMC0 | 319022774933 | 783746373 | 14053108949 | 1.661577e+09 |
| CPU_CLOCKS_UNHALTED STAT | PMC1 | 1975117444353 | 6649101710 | 13903071879 | 1.028707e+10 |
| DRAM_CHANNEL_4 STAT | DFC0 | 0 | inf | 0 | 0 |
| DRAM_CHANNEL_5 STAT | DFC1 | 0 | inf | 0 | 0 |
| DRAM_CHANNEL_6 STAT | DFC2 | 0 | inf | 0 | 0 |
| DRAM_CHANNEL_7 STAT | DFC3 | 0 | inf | 0 | 0 |
+---------------------------+---------+---------------+------------+-------------+--------------+
+-------------------------------------------------+-------------+-----------+-----------+-----------+
| Metric | Sum | Min | Max | Avg |
+-------------------------------------------------+-------------+-----------+-----------+-----------+
| Runtime (RDTSC) [s] STAT | 737.0496 | 3.8388 | 3.8388 | 3.8388 |
| Runtime unhalted [s] STAT | 834.0599 | 2.8190 | 5.8493 | 4.3441 |
| Clock [MHz] STAT | 708884.2274 | 3689.5780 | 3694.0565 | 3692.1054 |
| CPI STAT | 1226.1420 | 0.7612 | 9.6593 | 6.3862 |
| Memory bandwidth (channels 4-7) [MBytes/s] STAT | 0 | 0 | 0 | 0 |
| Memory data volume (channels 4-7) [GBytes] STAT | 0 | 0 | 0 | 0 |
+-------------------------------------------------+-------------+-----------+-----------+-----------+
INFO: You are running LIKWID in a cpuset with 192 CPUs. Taking given IDs as logical ID in cpuset
--------------------------------------------------------------------------------
CPU name: AMD EPYC 9654 96-Core Processor
CPU type: AMD K19 (Zen4) architecture
CPU clock: 2.40 GHz
--------------------------------------------------------------------------------
+---------------------------+---------+---------------+------------+-------------+--------------+
| Event | Counter | Sum | Min | Max | Avg |
+---------------------------+---------+---------------+------------+-------------+--------------+
| ACTUAL_CPU_CLOCK STAT | FIXC1 | 2007244857500 | 6802115600 | 14075787486 | 1.045440e+10 |
| MAX_CPU_CLOCK STAT | FIXC2 | 1302715101408 | 4412394336 | 9137566680 | 6.784974e+09 |
| RETIRED_INSTRUCTIONS STAT | PMC0 | 319758910632 | 786951029 | 14250478646 | 1.665411e+09 |
| CPU_CLOCKS_UNHALTED STAT | PMC1 | 1983547352860 | 6693526027 | 13952660032 | 1.033098e+10 |
| DRAM_CHANNEL_0 STAT | DFC0 | 0 | inf | 0 | 0 |
| DRAM_CHANNEL_1 STAT | DFC1 | 0 | inf | 0 | 0 |
| DRAM_CHANNEL_2 STAT | DFC2 | 0 | inf | 0 | 0 |
| DRAM_CHANNEL_3 STAT | DFC3 | 0 | inf | 0 | 0 |
+---------------------------+---------+---------------+------------+-------------+--------------+
+-------------------------------------------------+-------------+-----------+-----------+-----------+
| Metric | Sum | Min | Max | Avg |
+-------------------------------------------------+-------------+-----------+-----------+-----------+
| Runtime (RDTSC) [s] STAT | 740.3136 | 3.8558 | 3.8558 | 3.8558 |
| Runtime unhalted [s] STAT | 837.6987 | 2.8388 | 5.8744 | 4.3630 |
| Clock [MHz] STAT | 708920.2595 | 3690.0568 | 3694.0458 | 3692.2930 |
| CPI STAT | 1229.3118 | 0.7538 | 9.8461 | 6.4027 |
| Memory bandwidth (channels 0-3) [MBytes/s] STAT | 0 | 0 | 0 | 0 |
| Memory data volume (channels 0-3) [GBytes] STAT | 0 | 0 | 0 | 0 |
+-------------------------------------------------+-------------+-----------+-----------+-----------+
likwid-perfctr -a
Group name Description
--------------------------------------------------------------------------------
BRANCH Branch prediction miss rate/ratio
CACHE Data cache miss rate/ratio
CLOCK Cycles per instruction
CPI Cycles per instruction
DATA Load to store ratio
DIVIDE Divide unit information
ENERGY Power and Energy consumption
FLOPS_DP Double Precision MFLOP/s
FLOPS_SP Single Precision MFLOP/s
ICACHE Instruction cache miss rate/ratio
L2 L2 cache bandwidth in MBytes/s (experimental)
L2CACHE L2 cache miss rate/ratio (experimental)
L3 L3 cache bandwidth in MBytes/s
L3CACHE L3 cache miss rate/ratio (experimental)
MEM1 Main memory bandwidth in MBytes/s (channels 0-3)
MEM2 Main memory bandwidth in MBytes/s (channels 4-7)
NUMA L2 cache bandwidth in MBytes/s (experimental)
TLB TLB miss rate/ratio
$ size=$((100*1024));srun --nodes=1 --cpus-per-task=192 --threads-per-core=1 -t 1:00:00 --hint=nomultithread likwid-bench -t load_avx -W N:${size}kB:128
Cycles: 5695362048
CPU Clock: 2396160729
Cycle Clock: 2396160729
Time: 2.376870e+00 sec
Iterations: 33554432
Iterations per thread: 262144
Inner loop executions: 6250
Size (Byte): 102400000
Size per thread: 800000
Number of Flops: 0
MFlops/s: 0.00
Data volume (Byte): 26843545600000
MByte/s: 11293654.25
Cycles per update: 0.001697
Cycles per cacheline: 0.013579
Loads per update: 1
Stores per update: 0
Load bytes per element: 8
Store bytes per elem.: 0
Instructions: 1468006400016
UOPs: 1258291200000
I assume perf_event backend and suspect a too high setting in /proc/sys/kernel/perf_event_paranoid. It has to be zero to get data from the Uncore devices. Run with -V 1 and there should be a message.
The above situation also occurs on AMD 9554.
(note: I made a sum statistics data output, so the runtime is 384)
Runtime (RDTSC) [s]: 384.015717
Runtime unhalted [s]: 0.058734
Clock [MHz]: 199629.250000
CPI: nan
Memory bandwidth (channels 0-3) [MBytes/s]: 0.000000
Memory data volume (channels 0-3) [GBytes]: 0.000000
----------------------------
Runtime (RDTSC) [s]: 384.044250
Runtime unhalted [s]: 0.041370
Clock [MHz]: 186808.812500
CPI: nan
Memory bandwidth (channels 0-3) [MBytes/s]: 0.000000
Memory data volume (channels 0-3) [GBytes]: 0.000000
----------------------------
Runtime (RDTSC) [s]: 384.012970
Runtime unhalted [s]: 0.113669
Clock [MHz]: 187998.656250
CPI: nan
Memory bandwidth (channels 0-3) [MBytes/s]: 0.000000
Memory data volume (channels 0-3) [GBytes]: 0.000000
----------------------------
Runtime (RDTSC) [s]: 384.045624
Runtime unhalted [s]: 0.561691
Clock [MHz]: 191052.828125
CPI: nan
Memory bandwidth (channels 0-3) [MBytes/s]: 0.000000
Memory data volume (channels 0-3) [GBytes]: 0.000000
I assume
perf_eventbackend and suspect a too high setting in/proc/sys/kernel/perf_event_paranoid. It has to be zero to get data from the Uncore devices. Run with-V 1and there should be a message.
I attempted this, but it seems to have been ineffective.
What has been ineffective? Setting the value to zero or getting messages?
LIKWID with perf_event backend requires the unit amd_df to be present (/sys/devices/amd_df). If this device does not exist, there is no chance to get the memory traffic through perf_event and consequently LIKWID. You need a newer or patched kernel.
I encountered the same problem, os is Rocky linux 8.6 kernel version: 4.18.0-372.9.1.el8.x86_64 /proc/sys/kernel/perf_event_paranoid is 0 /sys/device/amd_df and /sys/device/amd_l3 has existed
[root@localhost bin]# grep -i perf_event /boot/config-4.18.0-372.9.1.el8.x86_64 CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_HAVE_PERF_EVENTS_NMI=y CONFIG_PERF_EVENTS_INTEL_UNCORE=m CONFIG_PERF_EVENTS_INTEL_RAPL=m CONFIG_PERF_EVENTS_INTEL_CSTATE=m CONFIG_PERF_EVENTS_AMD_POWER=m
[root@localhost bin]# likwid-perfctr -f -V 1 -g MEM2 /home/pcadmin/stream
CPU name: AMD EPYC 9554 64-Core Processor
CPU type: AMD K19 (Zen4) architecture
CPU clock: 3.10 GHz
CPU family: 25
CPU model: 17
CPU short: zen4
CPU stepping: 1
CPU features: FP MMX SSE SSE2 HTT MMX RDTSCP MONITOR SSSE FMA SSE4.1 SSE4.2 AES AVX RDRAND AVX2 AVX512 RDSEED SSE3
CPU arch: x86_64
DEBUG - [access_client_startDaemon:157] Starting daemon /usr/local/sbin/likwid-accessD DEBUG - [access_client_startDaemon:235] Successfully opened socket /tmp/likwid-83685 to daemon for CPU 127 Executing: /home/pcadmin/stream DEBUG - [perfmon_addEventSet:2328] Currently 1 groups of 2 active DEBUG - [perfgroup_readGroup:873] Reading group MEM2 from /usr/local/share/likwid/perfgroups/zen4/MEM2.txt DEBUG - [perfmon_addEventSet:2514] Added event ACTUAL_CPU_CLOCK for counter FIXC1 to group 0 DEBUG - [perfmon_addEventSet:2514] Added event MAX_CPU_CLOCK for counter FIXC2 to group 0 DEBUG - [perfmon_addEventSet:2514] Added event RETIRED_INSTRUCTIONS for counter PMC0 to group 0 DEBUG - [perfmon_addEventSet:2514] Added event CPU_CLOCKS_UNHALTED for counter PMC1 to group 0 DEBUG - [checkAccess:237] WARNING: Counter DFC0 does not exist DEBUG - [perfmon_addEventSet:2437] Cannot access counter register DFC0 DEBUG - [checkAccess:237] WARNING: Counter DFC1 does not exist DEBUG - [perfmon_addEventSet:2437] Cannot access counter register DFC1 DEBUG - [checkAccess:237] WARNING: Counter DFC2 does not exist DEBUG - [perfmon_addEventSet:2437] Cannot access counter register DFC2 DEBUG - [checkAccess:237] WARNING: Counter DFC3 does not exist DEBUG - [perfmon_addEventSet:2437] Cannot access counter register DFC3
zen4 cpu has 12 memory channels(https://www.amd.com/en/products/cpu/amd-epyc-9554),but why likwid library only support 8 memory channels for profmon datas?
I encountered the same problem, os is Rocky linux 8.6 kernel version: 4.18.0-372.9.1.el8.x86_64 /proc/sys/kernel/perf_event_paranoid is 0 /sys/device/amd_df and /sys/device/amd_l3 has existed
[root@localhost bin]# grep -i perf_event /boot/config-4.18.0-372.9.1.el8.x86_64 CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_HAVE_PERF_EVENTS_NMI=y CONFIG_PERF_EVENTS_INTEL_UNCORE=m CONFIG_PERF_EVENTS_INTEL_RAPL=m CONFIG_PERF_EVENTS_INTEL_CSTATE=m CONFIG_PERF_EVENTS_AMD_POWER=m
[root@localhost bin]# likwid-perfctr -f -V 1 -g MEM2 /home/pcadmin/stream
CPU name: AMD EPYC 9554 64-Core Processor
CPU type: AMD K19 (Zen4) architecture CPU clock: 3.10 GHz CPU family: 25 CPU model: 17 CPU short: zen4 CPU stepping: 1 CPU features: FP MMX SSE SSE2 HTT MMX RDTSCP MONITOR SSSE FMA SSE4.1 SSE4.2 AES AVX RDRAND AVX2 AVX512 RDSEED SSE3 CPU arch: x86_64 DEBUG - [access_client_startDaemon:157] Starting daemon /usr/local/sbin/likwid-accessD DEBUG - [access_client_startDaemon:235] Successfully opened socket /tmp/likwid-83685 to daemon for CPU 127 Executing: /home/pcadmin/stream DEBUG - [perfmon_addEventSet:2328] Currently 1 groups of 2 active DEBUG - [perfgroup_readGroup:873] Reading group MEM2 from /usr/local/share/likwid/perfgroups/zen4/MEM2.txt DEBUG - [perfmon_addEventSet:2514] Added event ACTUAL_CPU_CLOCK for counter FIXC1 to group 0 DEBUG - [perfmon_addEventSet:2514] Added event MAX_CPU_CLOCK for counter FIXC2 to group 0 DEBUG - [perfmon_addEventSet:2514] Added event RETIRED_INSTRUCTIONS for counter PMC0 to group 0 DEBUG - [perfmon_addEventSet:2514] Added event CPU_CLOCKS_UNHALTED for counter PMC1 to group 0 DEBUG - [checkAccess:237] WARNING: Counter DFC0 does not exist DEBUG - [perfmon_addEventSet:2437] Cannot access counter register DFC0 DEBUG - [checkAccess:237] WARNING: Counter DFC1 does not exist DEBUG - [perfmon_addEventSet:2437] Cannot access counter register DFC1 DEBUG - [checkAccess:237] WARNING: Counter DFC2 does not exist DEBUG - [perfmon_addEventSet:2437] Cannot access counter register DFC2 DEBUG - [checkAccess:237] WARNING: Counter DFC3 does not exist DEBUG - [perfmon_addEventSet:2437] Cannot access counter register DFC3
I maybe find this WARNING message reason, the struct zen4_counter_map of src/include/perfmon_zen4_counters.h file,missing Index "PMC17"。
@marquis-wang Yes, you found it. I fixed it yesterday night. Please test it: https://github.com/RRZE-HPC/likwid/commit/7027aa64bf7f8af87173a8778635fad4f012dcc6
I will add additional memory channels to the branch. Yes it should be 12.
@TomTheBear Great ! I test branch amd_zen4 :44cf4ca it works well.
It works but it is not done. I did some major updates yesterday to the branch but the branch cannot be merged, so I create a new one only with the fixes.
The events currently configured in MEM1 and MEM2 do no exist for Zen4 anymore, so unclear whether they actually count memory traffic. The updated version will not have MEM1 and MEM2 anymore but MEMREAD and MEMWRITE and use the officially documented metrics for memory traffic..
I want to using likwid library to develop collect tools for our's Cluster(Zen4), the memory bandwidth data of https://github.com/RRZE-HPC/likwid/commit/7027aa64bf7f8af87173a8778635fad4f012dcc6 missing 4 memory channls。 I look at the newest commit (44cf4ca) had add full channls ,so I test it ,I compare the likwid-perfctr‘s output(MEMREAD and MEMWRITE) and stream’s output,the results is no big difference。In he officially documented (AMD PPR Family 19h),i found a new event (DATA_BW)maybe helperful moniter the memory bandwidth, I will test the event .
I'm glad that it works for you now. Please be careful with the PPRs, you have to use the one for the family & model: AMD Family 19h Model 11h should be the right one. In the third document, it documents a DATA_BW event but it is just the in detail explanation/breakdown of the events already documented in https://github.com/RRZE-HPC/likwid/pull/618. Unfortunately, also with the details, it is impossible to perform read&write measurements in one go.
The UMC performance counters would be of interest to count at the memory controller instead of the DataFabric but they seem quite complicated to add. There is already infrastructure for MMIO based counters but some effort would be required. Unfortunately, they are never exposed by perf_event, so they can be added for accessdaemon/direct only.
The Zen4 fix was merged to the master branch. I close this issue now.