pcm
pcm copied to clipboard
`Local DRAM accesses` from pcm-numa does not match memory throughput from pcm-memory
Hi I am using PCM to measure memory channel usage. I ran a few STREAM applications (memory read/write intensive) on cpu cores 16-23. I didn't run any other applications. Then I use pcm-numa and pcm-memory to measure the memory channel usage. But Local DRAM accesses
from pcm-numa (527 MB, interval is 1 second, hence throughput is 527 MB/s) does not match memory throughput from pcm-memory (ie, 49375.64 MB/s). Is my understanding incorrect or do you happen to know why this happens? Thanks!
pcm-memory:
Detected Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz "Intel(r) microarchitecture codename Cascade Lake-SP" stepping 7 microcode level 0x5003604
Update every 1 seconds
|---------------------------------------||---------------------------------------|
|-- Socket 0 --||-- Socket 1 --|
|---------------------------------------||---------------------------------------|
|-- Memory Channel Monitoring --||-- Memory Channel Monitoring --|
|---------------------------------------||---------------------------------------|
|-- Mem Ch 0: Reads (MB/s): 18.88 --||-- Mem Ch 0: Reads (MB/s): 11746.97 --|
|-- Writes(MB/s): 5.22 --||-- Writes(MB/s): 4684.87 --|
|-- PMM Reads(MB/s) : 0.00 --||-- PMM Reads(MB/s) : 0.00 --|
|-- PMM Writes(MB/s) : 0.00 --||-- PMM Writes(MB/s) : 0.00 --|
|-- Mem Ch 1: Reads (MB/s): 19.14 --||-- Mem Ch 1: Reads (MB/s): 11748.66 --|
|-- Writes(MB/s): 5.32 --||-- Writes(MB/s): 4686.62 --|
|-- PMM Reads(MB/s) : 0.00 --||-- PMM Reads(MB/s) : 0.00 --|
|-- PMM Writes(MB/s) : 0.00 --||-- PMM Writes(MB/s) : 0.00 --|
|-- Mem Ch 2: Reads (MB/s): 18.85 --||-- Mem Ch 2: Reads (MB/s): 11749.36 --|
|-- Writes(MB/s): 5.13 --||-- Writes(MB/s): 4686.62 --|
|-- PMM Reads(MB/s) : 0.00 --||-- PMM Reads(MB/s) : 0.00 --|
|-- PMM Writes(MB/s) : 0.00 --||-- PMM Writes(MB/s) : 0.00 --|
|-- NODE 0 Mem Read (MB/s) : 56.87 --||-- NODE 1 Mem Read (MB/s) : 35244.99 --|
|-- NODE 0 Mem Write(MB/s) : 15.67 --||-- NODE 1 Mem Write(MB/s) : 14058.11 --|
|-- NODE 0 PMM Read (MB/s): 0.00 --||-- NODE 1 PMM Read (MB/s): 0.00 --|
|-- NODE 0 PMM Write(MB/s): 0.00 --||-- NODE 1 PMM Write(MB/s): 0.00 --|
|-- NODE 0.0 NM read hit rate : 0.99 --||-- NODE 1.0 NM read hit rate : 1.00 --|
|-- NODE 0.1 NM read hit rate : 0.00 --||-- NODE 1.1 NM read hit rate : 0.00 --|
|-- NODE 0.2 NM read hit rate : 0.00 --||-- NODE 1.2 NM read hit rate : 0.00 --|
|-- NODE 0.3 NM read hit rate : 0.00 --||-- NODE 1.3 NM read hit rate : 0.00 --|
|-- NODE 0 Memory (MB/s): 72.54 --||-- NODE 1 Memory (MB/s): 49303.10 --|
|---------------------------------------||---------------------------------------|
|---------------------------------------||---------------------------------------|
|-- System DRAM Read Throughput(MB/s): 35301.86 --|
|-- System DRAM Write Throughput(MB/s): 14073.78 --|
|-- System PMM Read Throughput(MB/s): 0.00 --|
|-- System PMM Write Throughput(MB/s): 0.00 --|
|-- System Read Throughput(MB/s): 35301.86 --|
|-- System Write Throughput(MB/s): 14073.78 --|
|-- System Memory Throughput(MB/s): 49375.64 --|
|---------------------------------------||---------------------------------------|
pcm-numa:
Core | IPC | Instructions | Cycles | Local DRAM accesses | Remote DRAM Accesses
0 0.97 24 M 25 M 3854 21 K
1 0.59 812 K 1366 K 1578 2367
2 0.36 207 K 569 K 503 653
3 0.26 124 K 477 K 891 443
4 0.36 255 K 707 K 788 545
5 0.30 189 K 629 K 517 344
6 0.21 189 K 892 K 759 528
7 0.33 371 K 1109 K 966 756
8 0.32 151 K 480 K 439 377
9 0.31 221 K 719 K 1894 684
10 0.35 164 K 471 K 498 384
11 0.42 323 K 761 K 718 1670
12 0.41 236 K 582 K 475 611
13 0.41 245 K 600 K 473 1195
14 0.38 205 K 546 K 525 647
15 0.34 187 K 553 K 436 900
16 0.45 1315 M 2895 M 65 M 31 K
17 0.45 1315 M 2895 M 65 M 17 K
18 0.45 1317 M 2895 M 66 M 18 K
19 0.45 1316 M 2895 M 66 M 15 K
20 0.45 1315 M 2895 M 66 M 14 K
21 0.45 1316 M 2895 M 66 M 46 K
22 0.45 1315 M 2895 M 66 M 17 K
23 0.45 1315 M 2895 M 65 M 18 K
24 0.43 710 K 1651 K 3026 7390
25 0.36 223 K 620 K 1055 1487
26 0.38 208 K 544 K 922 1289
27 0.26 112 K 435 K 757 896
28 0.28 110 K 396 K 638 727
29 0.33 107 K 327 K 562 541
30 0.32 148 K 466 K 677 579
31 0.10 102 K 1035 K 1069 1032
32 0.08 162 K 2136 K 1489 622
33 0.42 257 K 619 K 545 739
34 0.39 172 K 444 K 390 474
35 0.46 449 K 980 K 2142 973
36 0.41 190 K 467 K 297 437
37 0.25 145 K 579 K 425 353
38 0.42 934 K 2219 K 1503 2496
39 0.28 150 K 531 K 1024 827
40 0.37 149 K 402 K 339 323
41 0.34 85 K 249 K 234 349
42 0.38 94 K 246 K 282 233
43 0.45 80 K 177 K 269 233
44 0.33 105 K 318 K 301 278
45 0.34 97 K 284 K 308 163
46 0.37 92 K 250 K 262 154
47 0.36 332 K 911 K 1023 1164
48 0.14 405 K 2848 K 2121 3280
49 0.16 335 K 2048 K 1507 1278
50 0.17 358 K 2064 K 1531 1392
51 0.16 307 K 1953 K 1454 1055
52 0.16 234 K 1478 K 1166 874
53 0.13 350 K 2606 K 2390 3506
54 0.17 291 K 1727 K 1489 1019
55 0.16 344 K 2133 K 1732 1365
56 0.36 160 K 448 K 873 654
57 0.44 586 K 1333 K 2182 2141
58 0.45 408 K 905 K 1591 1161
59 0.44 357 K 804 K 1230 1014
60 0.35 215 K 608 K 1117 1301
61 0.33 157 K 478 K 733 613
62 0.30 197 K 654 K 1493 5732
63 0.15 441 K 3022 K 4924 15 K
-------------------------------------------------------------------------------------------------------------------
* 0.45 10 G 23 G 527 M 279 K
there are a few more things to consider: pcm-numa measures accesses. Each read access can trigger 64 byte transfer (cache line) or up to two 64 byte transfers (read-for-ownership + write-back) for a write access. This depends on the architecture. 0.527*64 = 33 Gbyte/sec which is close to your read bandwidth measured by pcm-memory. Some of these accesses are writes and generate the additional write bandwidth (14 Gbyte/sec in pcm-memory). Hardware prefetches can also generate additional traffic. pcm-numa is not intended to measure exact memory bandwidth. It is more to assess remote/local access distribution.