pcm icon indicating copy to clipboard operation
pcm copied to clipboard

`Local DRAM accesses` from pcm-numa does not match memory throughput from pcm-memory

Open QiongwenXu opened this issue 1 year ago • 1 comments

Hi I am using PCM to measure memory channel usage. I ran a few STREAM applications (memory read/write intensive) on cpu cores 16-23. I didn't run any other applications. Then I use pcm-numa and pcm-memory to measure the memory channel usage. But Local DRAM accesses from pcm-numa (527 MB, interval is 1 second, hence throughput is 527 MB/s) does not match memory throughput from pcm-memory (ie, 49375.64 MB/s). Is my understanding incorrect or do you happen to know why this happens? Thanks! pcm-memory:

Detected Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz "Intel(r) microarchitecture codename Cascade Lake-SP" stepping 7 microcode level 0x5003604
Update every 1 seconds
|---------------------------------------||---------------------------------------|
|--             Socket  0             --||--             Socket  1             --|
|---------------------------------------||---------------------------------------|
|--     Memory Channel Monitoring     --||--     Memory Channel Monitoring     --|
|---------------------------------------||---------------------------------------|
|-- Mem Ch  0: Reads (MB/s):    18.88 --||-- Mem Ch  0: Reads (MB/s): 11746.97 --|
|--            Writes(MB/s):     5.22 --||--            Writes(MB/s):  4684.87 --|
|--      PMM Reads(MB/s)   :     0.00 --||--      PMM Reads(MB/s)   :     0.00 --|
|--      PMM Writes(MB/s)  :     0.00 --||--      PMM Writes(MB/s)  :     0.00 --|
|-- Mem Ch  1: Reads (MB/s):    19.14 --||-- Mem Ch  1: Reads (MB/s): 11748.66 --|
|--            Writes(MB/s):     5.32 --||--            Writes(MB/s):  4686.62 --|
|--      PMM Reads(MB/s)   :     0.00 --||--      PMM Reads(MB/s)   :     0.00 --|
|--      PMM Writes(MB/s)  :     0.00 --||--      PMM Writes(MB/s)  :     0.00 --|
|-- Mem Ch  2: Reads (MB/s):    18.85 --||-- Mem Ch  2: Reads (MB/s): 11749.36 --|
|--            Writes(MB/s):     5.13 --||--            Writes(MB/s):  4686.62 --|
|--      PMM Reads(MB/s)   :     0.00 --||--      PMM Reads(MB/s)   :     0.00 --|
|--      PMM Writes(MB/s)  :     0.00 --||--      PMM Writes(MB/s)  :     0.00 --|
|-- NODE 0 Mem Read (MB/s) :    56.87 --||-- NODE 1 Mem Read (MB/s) : 35244.99 --|
|-- NODE 0 Mem Write(MB/s) :    15.67 --||-- NODE 1 Mem Write(MB/s) : 14058.11 --|
|-- NODE 0 PMM Read (MB/s):      0.00 --||-- NODE 1 PMM Read (MB/s):      0.00 --|
|-- NODE 0 PMM Write(MB/s):      0.00 --||-- NODE 1 PMM Write(MB/s):      0.00 --|
|-- NODE 0.0 NM read hit rate :  0.99 --||-- NODE 1.0 NM read hit rate :  1.00 --|
|-- NODE 0.1 NM read hit rate :  0.00 --||-- NODE 1.1 NM read hit rate :  0.00 --|
|-- NODE 0.2 NM read hit rate :  0.00 --||-- NODE 1.2 NM read hit rate :  0.00 --|
|-- NODE 0.3 NM read hit rate :  0.00 --||-- NODE 1.3 NM read hit rate :  0.00 --|
|-- NODE 0 Memory (MB/s):       72.54 --||-- NODE 1 Memory (MB/s):    49303.10 --|
|---------------------------------------||---------------------------------------|
|---------------------------------------||---------------------------------------|
|--            System DRAM Read Throughput(MB/s):      35301.86                --|
|--           System DRAM Write Throughput(MB/s):      14073.78                --|
|--             System PMM Read Throughput(MB/s):          0.00                --|
|--            System PMM Write Throughput(MB/s):          0.00                --|
|--                 System Read Throughput(MB/s):      35301.86                --|
|--                System Write Throughput(MB/s):      14073.78                --|
|--               System Memory Throughput(MB/s):      49375.64                --|
|---------------------------------------||---------------------------------------|

pcm-numa:

Core | IPC  | Instructions | Cycles  |  Local DRAM accesses | Remote DRAM Accesses
   0   0.97         24 M       25 M      3854                  21 K
   1   0.59        812 K     1366 K      1578                2367
   2   0.36        207 K      569 K       503                 653
   3   0.26        124 K      477 K       891                 443
   4   0.36        255 K      707 K       788                 545
   5   0.30        189 K      629 K       517                 344
   6   0.21        189 K      892 K       759                 528
   7   0.33        371 K     1109 K       966                 756
   8   0.32        151 K      480 K       439                 377
   9   0.31        221 K      719 K      1894                 684
  10   0.35        164 K      471 K       498                 384
  11   0.42        323 K      761 K       718                1670
  12   0.41        236 K      582 K       475                 611
  13   0.41        245 K      600 K       473                1195
  14   0.38        205 K      546 K       525                 647
  15   0.34        187 K      553 K       436                 900
  16   0.45       1315 M     2895 M        65 M                31 K
  17   0.45       1315 M     2895 M        65 M                17 K
  18   0.45       1317 M     2895 M        66 M                18 K
  19   0.45       1316 M     2895 M        66 M                15 K
  20   0.45       1315 M     2895 M        66 M                14 K
  21   0.45       1316 M     2895 M        66 M                46 K
  22   0.45       1315 M     2895 M        66 M                17 K
  23   0.45       1315 M     2895 M        65 M                18 K
  24   0.43        710 K     1651 K      3026                7390
  25   0.36        223 K      620 K      1055                1487
  26   0.38        208 K      544 K       922                1289
  27   0.26        112 K      435 K       757                 896
  28   0.28        110 K      396 K       638                 727
  29   0.33        107 K      327 K       562                 541
  30   0.32        148 K      466 K       677                 579
  31   0.10        102 K     1035 K      1069                1032
  32   0.08        162 K     2136 K      1489                 622
  33   0.42        257 K      619 K       545                 739
  34   0.39        172 K      444 K       390                 474
  35   0.46        449 K      980 K      2142                 973
  36   0.41        190 K      467 K       297                 437
  37   0.25        145 K      579 K       425                 353
  38   0.42        934 K     2219 K      1503                2496
  39   0.28        150 K      531 K      1024                 827
  40   0.37        149 K      402 K       339                 323
  41   0.34         85 K      249 K       234                 349
  42   0.38         94 K      246 K       282                 233
  43   0.45         80 K      177 K       269                 233
  44   0.33        105 K      318 K       301                 278
  45   0.34         97 K      284 K       308                 163
  46   0.37         92 K      250 K       262                 154
  47   0.36        332 K      911 K      1023                1164
  48   0.14        405 K     2848 K      2121                3280
  49   0.16        335 K     2048 K      1507                1278
  50   0.17        358 K     2064 K      1531                1392
  51   0.16        307 K     1953 K      1454                1055
  52   0.16        234 K     1478 K      1166                 874
  53   0.13        350 K     2606 K      2390                3506
  54   0.17        291 K     1727 K      1489                1019
  55   0.16        344 K     2133 K      1732                1365
  56   0.36        160 K      448 K       873                 654
  57   0.44        586 K     1333 K      2182                2141
  58   0.45        408 K      905 K      1591                1161
  59   0.44        357 K      804 K      1230                1014
  60   0.35        215 K      608 K      1117                1301
  61   0.33        157 K      478 K       733                 613
  62   0.30        197 K      654 K      1493                5732
  63   0.15        441 K     3022 K      4924                  15 K
-------------------------------------------------------------------------------------------------------------------
   *   0.45         10 G       23 G       527 M               279 K

QiongwenXu avatar Feb 06 '24 03:02 QiongwenXu

there are a few more things to consider: pcm-numa measures accesses. Each read access can trigger 64 byte transfer (cache line) or up to two 64 byte transfers (read-for-ownership + write-back) for a write access. This depends on the architecture. 0.527*64 = 33 Gbyte/sec which is close to your read bandwidth measured by pcm-memory. Some of these accesses are writes and generate the additional write bandwidth (14 Gbyte/sec in pcm-memory). Hardware prefetches can also generate additional traffic. pcm-numa is not intended to measure exact memory bandwidth. It is more to assess remote/local access distribution.

rdementi avatar Feb 21 '24 15:02 rdementi