CpuId.jl icon indicating copy to clipboard operation
CpuId.jl copied to clipboard

CpuId-based sched_getcpu pendant for macOS

Open carstenbauer opened this issue 4 years ago • 8 comments

On linux one can use sched_getcpu to query the id of the cpu a thread is running on:

@ccall sched_getcpu()::Cint

and

using ThreadPools
cpuid(i::Integer) = fetch(@tspawnat i @ccall sched_getcpu()::Cint)

I want to do the same on macOS where sched_getcpu isn't available. Looking for an alternative I stumbled across https://stackoverflow.com/questions/33745364/sched-getcpu-equivalent-for-os-x which mentions an alternative based on "cpuid":

#include <cpuid.h>

#define CPUID(INFO, LEAF, SUBLEAF) __cpuid_count(LEAF, SUBLEAF, INFO[0], INFO[1], INFO[2], INFO[3])

#define GETCPU(CPU) {                              \
        uint32_t CPUInfo[4];                           \
        CPUID(CPUInfo, 1, 0);                          \
        /* CPUInfo[1] is EBX, bits 24-31 are APIC ID */ \
        if ( (CPUInfo[3] & (1 << 9)) == 0) {           \
          CPU = -1;  /* no APIC on chip */             \
        }                                              \
        else {                                         \
          CPU = (unsigned)CPUInfo[1] >> 24;                    \
        }                                              \
        if (CPU < 0) CPU = 0;                          \
      }

Unfortunately, both my C and "cpuid" knowledge are limited which is why I can't translate this to Julia. Can someone help me out here? Personally, I think it would be a great addition to this package. Being able to ask on which cpu a thread is running across different OSs would be very useful in some cases.

Any help is very much appreciated. (And forgive me if I'm asking for too much here.)

carstenbauer avatar Mar 29 '21 17:03 carstenbauer

using CpuId
function coreid()
    eax, ebx, ecx, edx =  CpuId.cpuid(1, 0)
    if ( (edx & (0x00000001 << 9)) == 0x00000000)
        CPU = -1;  # no APIC on chip
    else
        CPU = (ebx%Int) >> 24;
    end
    CPU < 0 ? 0 : CPU
end

chriselrod avatar Mar 29 '21 18:03 chriselrod

Great, thanks for the translation!

However, I'm not sure it works correctly (but it may well be that the original C code isn't working anymore either):

Script:

# threads_cpuids.jl
using CpuId
function cpuid_coreid()
    eax, ebx, ecx, edx =  CpuId.cpuid(1, 0)
    if ( (edx & (0x00000001 << 9)) == 0x00000000)
        CPU = -1;  # no APIC on chip
    else
        CPU = (ebx%Int) >> 24;
    end
    CPU < 0 ? 0 : CPU
end

glibc_coreid() = @ccall sched_getcpu()::Cint


using ThreadPools
using Base.Threads: nthreads

tglibc_coreid(i::Integer) = fetch(@tspawnat i glibc_coreid());
tcpuid_coreid(i::Integer) = fetch(@tspawnat i cpuid_coreid());

for i in 1:nthreads()
    println("Running on thread $i (glibc_coreid: $(tglibc_coreid(i)), cpuid_coreid: $(tcpuid_coreid(i)))")
    # @sync @tspawnat i sum(abs2, rand()^2 + rand()^2 for i in 1:500_000_000)
end

Output on a linux machine (where I know/checked that glibc_coreid is correct):

$ julia -t10 threads_cpuids.jl
Running on thread 1 (glibc_coreid: 2, cpuid_coreid: 2)
Running on thread 2 (glibc_coreid: 4, cpuid_coreid: 4)
Running on thread 3 (glibc_coreid: 3, cpuid_coreid: 34)
Running on thread 4 (glibc_coreid: 6, cpuid_coreid: 6)
Running on thread 5 (glibc_coreid: 5, cpuid_coreid: 36)
Running on thread 6 (glibc_coreid: 8, cpuid_coreid: 8)
Running on thread 7 (glibc_coreid: 7, cpuid_coreid: 38)
Running on thread 8 (glibc_coreid: 10, cpuid_coreid: 16)
Running on thread 9 (glibc_coreid: 9, cpuid_coreid: 40)
Running on thread 10 (glibc_coreid: 12, cpuid_coreid: 18)

carstenbauer avatar Mar 29 '21 20:03 carstenbauer

You can also try cpucycle_id, which returns a tuple. You would be interested in the second element, I believe, which is also the APIC id.

"""
    cpucycle_id()
Read the CPU's [Time Stamp Counter, TSC](https://en.wikipedia.org/wiki/Time_Stamp_Counter),
and executing CPU id directly with a `rdtscp` instruction.  This function is
similar to the `cpucycle()`, but uses an instruction that also allows to
detect if the code has been moved to a different executing CPU.  See also the
comments for `cpucycle()` which equally apply.
"""
function cpucycle_id end
@eval cpucycle_id() = $(cpufeature(RDTSCP)) ? rdtscp() : (zero(UInt64),zero(UInt64))

m-j-w avatar Mar 29 '21 21:03 m-j-w

Doesn't seem to work either:

# threads_cpuids.jl
using CpuId
function cpuid_coreid()
    eax, ebx, ecx, edx =  CpuId.cpuid(1, 0)
    if ( (edx & (0x00000001 << 9)) == 0x00000000)
        CPU = -1;  # no APIC on chip
    else
        CPU = (ebx%Int) >> 24;
    end
    CPU < 0 ? 0 : CPU
end

glibc_coreid() = @ccall sched_getcpu()::Cint

cpucycle_coreid() = Int(cpucycle_id()[2])

using ThreadPools
using Base.Threads: nthreads

tglibc_coreid(i::Integer) = fetch(@tspawnat i glibc_coreid());
tcpuid_coreid(i::Integer) = fetch(@tspawnat i cpuid_coreid());
tcpucycle_coreid(i::Integer) = fetch(@tspawnat i cpucycle_coreid());

for i in 1:nthreads()
println("Running on thread $i (glibc_coreid: $(tglibc_coreid(i)), cpuid_coreid: $(tcpuid_coreid(i)), cpucycle_coreid: $(tcpucycle_coreid(i)))")
    # @sync @tspawnat i sum(abs2, rand()^2 + rand()^2 for i in 1:500_000_000)
end
$ julia -t10 threads_cpuids.jl
Running on thread 1 (glibc_coreid: 2, cpuid_coreid: 2, cpucycle_coreid: 2)
Running on thread 2 (glibc_coreid: 4, cpuid_coreid: 4, cpucycle_coreid: 4)
Running on thread 3 (glibc_coreid: 3, cpuid_coreid: 34, cpucycle_coreid: 4099)
Running on thread 4 (glibc_coreid: 6, cpuid_coreid: 6, cpucycle_coreid: 6)
Running on thread 5 (glibc_coreid: 5, cpuid_coreid: 36, cpucycle_coreid: 4101)
Running on thread 6 (glibc_coreid: 8, cpuid_coreid: 8, cpucycle_coreid: 8)
Running on thread 7 (glibc_coreid: 7, cpuid_coreid: 38, cpucycle_coreid: 4103)
Running on thread 8 (glibc_coreid: 10, cpuid_coreid: 16, cpucycle_coreid: 10)
Running on thread 9 (glibc_coreid: 9, cpuid_coreid: 40, cpucycle_coreid: 4105)
Running on thread 10 (glibc_coreid: 12, cpuid_coreid: 18, cpucycle_coreid: 12)

carstenbauer avatar Mar 29 '21 22:03 carstenbauer

cpucycle_coreid works for me:

julia> for i in 1:nthreads()
       println("Running on thread $i (glibc_coreid: $(tglibc_coreid(i)), cpuid_coreid: $(tcpuid_coreid(i)), cpucycle_coreid: $(tcpucycle_coreid(i)))")
           # @sync @tspawnat i sum(abs2, rand()^2 + rand()^2 for i in 1:500_000_000)
       end
Running on thread 1 (glibc_coreid: 0, cpuid_coreid: 0, cpucycle_coreid: 0)
Running on thread 2 (glibc_coreid: 1, cpuid_coreid: 2, cpucycle_coreid: 1)
Running on thread 3 (glibc_coreid: 2, cpuid_coreid: 4, cpucycle_coreid: 2)
Running on thread 4 (glibc_coreid: 3, cpuid_coreid: 6, cpucycle_coreid: 3)
Running on thread 5 (glibc_coreid: 4, cpuid_coreid: 8, cpucycle_coreid: 4)
Running on thread 6 (glibc_coreid: 5, cpuid_coreid: 16, cpucycle_coreid: 5)
Running on thread 7 (glibc_coreid: 6, cpuid_coreid: 18, cpucycle_coreid: 6)
Running on thread 8 (glibc_coreid: 7, cpuid_coreid: 20, cpucycle_coreid: 7)
Running on thread 9 (glibc_coreid: 8, cpuid_coreid: 22, cpucycle_coreid: 8)
Running on thread 10 (glibc_coreid: 9, cpuid_coreid: 24, cpucycle_coreid: 9)
Running on thread 11 (glibc_coreid: 10, cpuid_coreid: 1, cpucycle_coreid: 10)
Running on thread 12 (glibc_coreid: 11, cpuid_coreid: 3, cpucycle_coreid: 11)
Running on thread 13 (glibc_coreid: 12, cpuid_coreid: 5, cpucycle_coreid: 12)
Running on thread 14 (glibc_coreid: 13, cpuid_coreid: 7, cpucycle_coreid: 13)
Running on thread 15 (glibc_coreid: 14, cpuid_coreid: 9, cpucycle_coreid: 14)
Running on thread 16 (glibc_coreid: 15, cpuid_coreid: 17, cpucycle_coreid: 15)
Running on thread 17 (glibc_coreid: 16, cpuid_coreid: 19, cpucycle_coreid: 16)
Running on thread 18 (glibc_coreid: 17, cpuid_coreid: 21, cpucycle_coreid: 17)
Running on thread 19 (glibc_coreid: 18, cpuid_coreid: 23, cpucycle_coreid: 18)
Running on thread 20 (glibc_coreid: 19, cpuid_coreid: 25, cpucycle_coreid: 19)

If you mask off the result:

cpucycle_coreid() & 0x00000fff

Then all those results will match glibc_coreid

(although cpuid_coreid will still be wrong, it seems like cpucylce_coreid should work)

May be worth checking for more architectures whether 0x00000fff is really an appropriate mask. Could calculate a mask based on the number of cores:

julia> ((1 << (64 - leading_zeros(CpuId.cputhreads()-1))) - 1) % UInt32 # 20 threads on this system
0x0000001f

This rounds up to the next power of 2, subtracts 1, and truncates to a 32 bit integer.

Would probably be better to look at the CpuId instruction to figure out what mask to apply.

chriselrod avatar Mar 30 '21 02:03 chriselrod

Check e.g. this AMD specification, page 27… https://www.google.com/url?sa=t&source=web&rct=j&url=https://www.amd.com/system/files/TechDocs/25481.pdf&ved=2ahUKEwjBurKJzNfvAhXC_rsIHbmsBTYQFjACegQICxAC&usg=AOvVaw1LS_9zk2Z-zenxvhUSlD20

CPUID Fn8000_0008_ECX APIC ID Size and Core Count

The width of the APIC ID is variable across architectures. The above page should give the bit width. However, there is also a legacy method mentioned.

m-j-w avatar Mar 30 '21 08:03 m-j-w

If you mask off the result:

cpucycle_coreid() & 0x00000fff

Then all those results will match glibc_coreid

(although cpuid_coreid will still be wrong, it seems like cpucylce_coreid should work)

May be worth checking for more architectures whether 0x00000fff is really an appropriate mask. Could calculate a mask based on the number of cores:

julia> ((1 << (64 - leading_zeros(CpuId.cputhreads()-1))) - 1) % UInt32 # 20 threads on this system
0x0000001f

This rounds up to the next power of 2, subtracts 1, and truncates to a 32 bit integer.

Would probably be better to look at the CpuId instruction to figure out what mask to apply.

For the record, I find that the following masks:

  • ((1 << (64 - leading_zeros(CpuId.cputhreads()))) - 1) % UInt32
  • 0x00000fff work -- eg:
const cpucycle_mask = ((1 << (64 - leading_zeros(CpuId.cputhreads()))) - 1) % UInt32
cpucycle_coreid() = Int(cpucycle_id()[2] & cpucycle_mask)

But ((1 << (64 - leading_zeros(CpuId.cputhreads()-1))) - 1) doesn't work. I don't know why yet though.

JBlaschke avatar Jul 22 '23 00:07 JBlaschke

Quick update: When I wrote my last comment (above), I was running this on Perlmutter's login nodes (AMD Milan). On my intel laptop, the mask 0x00000fff doesn't work -- but ((1 << (64 - leading_zeros(CpuId.cputhreads()))) - 1) % UInt32 still works.

JBlaschke avatar Jul 23 '23 03:07 JBlaschke