hwloc icon indicating copy to clipboard operation
hwloc copied to clipboard

Information about CPU features

Open bgoglin opened this issue 9 years ago • 5 comments

AVX, SSE, Altivec, ...

bgoglin avatar Apr 08 '16 20:04 bgoglin

:+1:

jbenden avatar Jul 22 '18 20:07 jbenden

Any plans to support this?

And/or abstracted information such as "number of vector registers", and SIMD-width for different types? E.g., to know that AVX512 has 32 registers vs 16 for other x86_64, or that SandyBridge (AVX but no AVX2) has 256 bit vectors usable for floating point operations, but only has SSE avaiable (128 bit registers) for integer operations. And similar for other arches like ARM, which I'm much less familiar with and thus have a harder time implementing checks reliably on my own.

chriselrod avatar Aug 19 '20 16:08 chriselrod

I am open to discussion about this but defining a generic API for this is hard. Some people just want x86 features has a bitmask (like what you get from cpuid) or a giant string (flags from /proc/cpuinfo). Some people want more details such as the AVX FPU being shared between dual-cores in old Opteron (horrible and hopefully uncommon). Some people also want "features" to distinguish little and big cores in heterogeneous processors. It could be aforementioned features. But it could also be internal things such out-of-order, higher frequency, etc. Those are harder to describe besides saying "big" or "little" core.

bgoglin avatar Aug 22 '20 06:08 bgoglin

I'd prefer a higher level interface, if possible. For my purposes, the 32 KiB of L1 cache on Cascade Lake-X CPUs is only the second closest memory to the execution units -- the closest and fastest is the 2 KiB of vector registers (32 registers, 64 bytes each).

Ready access to this (number and size), which you could place topologically at the closest level, would be the most important addition to me.

For x86, as I'm using Julia, I'm using LLVM's GetHostCPUFeatures and interpreting these flags for much of the info I need.

But it'd take a lot of work to read manuals for every architecture and fill out the checklist including vector register size and count, while also looking for all the flags I need to parse to find out things such as:

Double-precision support is optional, with its presence being indicated by the variant letter D. So the VFPv1D variant has both single-precision and double-precision, while VFPv1xD supports single-precision only.

From page 852 of the ARM architecture reference. Life would be much easier for me if I could rely on a library doing this for me.

AVX FPU being shared between dual-cores in old Opteron

I'd find details like this (and less severe things, like 0/1/2 fma units) useful, but a lower priority.

One of my libraries uses this information for code generation. Anything abstracted it can use for cost modeling to make better decisions is helpful, but if some kinds of information are wrong, such as he number and size of registers, or whether SIMD operations are even supported for the data type being operated on, performance is likely to take a substantial hit. I generate code largely through LLVM vector intrinsics, which LLVM is quite good at lowering in a target dependent manner as long as these high level details are right.

Obviously I wouldn't object to things like bit masks or giant strings also being available.

chriselrod avatar Aug 22 '20 08:08 chriselrod

FYI, as of hwloc 2.4 (release coming very soon), there's a new "CPU kinds" API for exposing hardware info about sets of cores. For now, it'll be key/value pairs for frequency and Intel core type. We may easily extend it. /proc/cpuinfo x86 flags are easy to add in there. Other things may require new cpukinds query functions, I am open to suggestion about what these functions would look like in practice.

bgoglin avatar Nov 23 '20 09:11 bgoglin