intel-cmt-cat
intel-cmt-cat copied to clipboard
Mismatch Between the LLC Misses and the MBM Counter
trafficstars
When I try to run a program that supposedly always misses the LLC, the LLC misses count does not match the MBM counter, the latter of which stays at zero. A minimal version of the program (40 LoC) is attached below.
# The program is pinned at core #3 CORE IPC MISSES LLC[KB] MBL[MB/s] MBR[MB/s] 3 0.34 55601k 22464.0 0.0 0.0
The number of misses is 55601k, so I'd expect the MBL counter to be approximately 55601 * 1000 * 64 / (1024 ^ 2) = 3394MB/s. But it always stays at 0.
/* This program sequentially iterates over a 1GB buffer
in a large stride so that it always accesses the same cache set */
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <linux/mman.h>
#define SIZE_1GB (1024*1024*1024)
/* The MBM counter works as expected when the stride is 64B */
// #define STRIDE 64
/**
Always access the same cache set
This is specific for the Xeon 8275CL on EC2 c5.metal
NUM_CL = LLC_SIZE / CL_SIZE = 35.75MB / 64B = 585728
NUM_SETS = NUM_CL / NUM_WAYS = 585728 / 11 = 53248
NUM_SETS * CL_STRIDE = 53248 * 64 = 3407872 = 0x340000
*/
#define STRIDE 0x340000
int main() {
// Use a physically continuous page so that we can access the same cache set
register char *buf = (char *)mmap(/*addr*/ 0, /*len*/ SIZE_1GB,
/*prot*/ PROT_READ | PROT_WRITE,
/*flags*/ MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_HUGE_1GB, /*fd*/ 0,
/*offset*/ 0);
register unsigned offset = 0;
register uint64_t val = 10;
while (1) {
asm volatile("movq (%1), %0\n\t" : : "r"(val), "r"(buf + offset));
offset += STRIDE;
if (offset >= SIZE_1GB) {
offset = 0;
}
}
munmap(buf, SIZE_1GB);
}
=== Configuration ===
Platform: AWS c5.metal CPU: Xeon 8275CL OS And Kernel Version: Ubuntu 24.04 (6.8.0-1012-aws) PQoS library version: 6.0.0