Microbenchmarks
Microbenchmarks copied to clipboard
weird pin thread to cpu performance
Hi @clamchowder,
I want to pin thread to cpu when measuring bandwidth, but I found that there seems no such facility under the non-numa mode. So I just borrow this part from CoherenceLatency:
void *ReadBandwidthTestThread(void *param) {
BandwidthTestThreadData* bwTestData = (BandwidthTestThreadData*)param;
if (hardaffinity) {
sched_setaffinity(gettid(), sizeof(cpu_set_t), &global_cpuset);
} else {
// I add the following lines:
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(bwTestData->processorIndex, &cpuset);
sched_setaffinity(gettid(), sizeof(cpu_set_t), &cpuset);
fprintf(stderr, "thread %ld set affinity %d\n", gettid(), bwTestData->processorIndex);
}
...
}
Besides, the processorIndex
is calculated by thread_idx % nprocs
according to the processor to core id mapping from /proc/cpuinfo
.
I test on AMD Ryzen 7 5800X CPU, where only one numa node is equipped (8 physical cores, and 16 logical cores). So I didn't enable numa.
I got the following results:
In the figure above, "auto" means I run original MemoryBandwith code, while "manual" means I added the CPU_SET
and sched_setaffinity
as the code snippet shows. The left and right figures show 8 and 16 threads results respectively.
My question is, why are the "manual" bandwidth results lower than those of "auto" for 8 threads, while the "manual" catches up for 16 threads?
Thanks, troore