Microbenchmarks weird pin thread to cpu performance

weird pin thread to cpu performance

Open troore opened this issue 7 months ago • 5 comments

Hi @clamchowder,

I want to pin thread to cpu when measuring bandwidth, but I found that there seems no such facility under the non-numa mode. So I just borrow this part from CoherenceLatency:

void *ReadBandwidthTestThread(void *param) {
    BandwidthTestThreadData* bwTestData = (BandwidthTestThreadData*)param;
    if (hardaffinity) { 
        sched_setaffinity(gettid(), sizeof(cpu_set_t), &global_cpuset);
    } else {
        // I add the following lines:
        cpu_set_t cpuset;
        CPU_ZERO(&cpuset);
        CPU_SET(bwTestData->processorIndex, &cpuset);
        sched_setaffinity(gettid(), sizeof(cpu_set_t), &cpuset);
        fprintf(stderr, "thread %ld set affinity %d\n", gettid(), bwTestData->processorIndex);
    }
   ...
}

Besides, the processorIndex is calculated by thread_idx % nprocs according to the processor to core id mapping from /proc/cpuinfo.

I test on AMD Ryzen 7 5800X CPU, where only one numa node is equipped (8 physical cores, and 16 logical cores). So I didn't enable numa.

I got the following results:

In the figure above, "auto" means I run original MemoryBandwith code, while "manual" means I added the CPU_SET and sched_setaffinity as the code snippet shows. The left and right figures show 8 and 16 threads results respectively.

My question is, why are the "manual" bandwidth results lower than those of "auto" for 8 threads, while the "manual" catches up for 16 threads?

Thanks, troore

Jul 05 '24 07:07 troore

Microbenchmarks Microbenchmarks copied to clipboard

weird pin thread to cpu performance

Microbenchmarks
Microbenchmarks copied to clipboard