cpuinfo
cpuinfo copied to clipboard
Ignore disabled hyperthreads
Only consider processors with an APIC_ID as valid.
For processors with Hyperthreads but disabled SMT the APIC_ID will never be set by CPUInfo and stays zero for the "disabled" threads. Among other inconsistencies this causes the first active thread/processor to be considered "the same" as all the disabled ones as they share APIC_ID=0
It also doesn't make sense to make and decisions based on the APIC_ID if we don't have any information on it.
So this patch adds a check for CPUINFO_LINUX_FLAG_APIC_ID.
The flag CPUINFO_LINUX_FLAG_APIC_ID will be set if and only if the APIC_ID has been set. If it has not then CPUInfo doesn't has any information about the APIC_ID, hence the default value of zero. See https://github.com/pytorch/cpuinfo/blob/39ea79a3c132f4e678695c579ea9353d2bd29968/src/x86/linux/cpuinfo.c#L68-L69
On a dual socket AMD EPYC 9334 system (2x32 cores, 128 threads total (2/core)) the current output of cache-info is
Max cache size (upper bound): 33554432 bytes
L1 instruction cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 65 processors
L1 data cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 65 processors
L2 unified cache: 64 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 65 processors
L3 unified cache: 8 x 32 MB (exclusive), 16-way set associative (32768 sets), 64 byte lines, shared by 72 processors
Note the "65 processors" being one-off.
With this patch:
Max cache size (upper bound): 33554432 bytes
L1 instruction cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 1 processors
L1 data cache: 64 x 32 KB, 8-way set associative (64 sets), 64 byte lines, shared by 1 processors
L2 unified cache: 64 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 1 processors
L3 unified cache: 8 x 32 MB (exclusive), 16-way set associative (32768 sets), 64 byte lines, shared by 8 processor
This matches the Zen4 architecture having 32KB Data and 32KB instruction L1 cache per core, 1MB L2 cache per core and 256MB L3 cache total.
Fixes #238
@Flamefire can you please fix lint? (namely run clang-format on your PR) Also, do you mind adding a few more references in the doc to the PR description that will help with reviewing this change
@Flamefire can you please fix lint? (namely run clang-format on your PR) Also, do you mind adding a few more references in the doc to the PR description that will help with reviewing this change
Fixed the clang format issue and squashed it into the initial commit
I'm not sure what to put into the description in addition to what I have. I added a paragraph linking to the logic of CPUInfo itself. I.e. the issue is that while CPUInfo "knows" that it doesn't has information on the APIC ID it still uses it as-if it was valid.
As for WHY the information is not available I wasn't able to find any reliable information. But the change at least makes sense from the perspective of the existing logic and observations.
@pytorchbot rebase
Rebased manually. Not sure why the bot didn't do it.
https://github.com/pytorch/cpuinfo/issues/238 is for nnpack? Did you mean xnnpack? The change looks okay.
I've seen similar issues on arm where cores are unavailable, but for different reasons on low end devices such as nexus 5, overheating would turn off cpus, starting with the big cores. on high end devices, but running on 32 bit, only medium cores are available.. the little and big/prime cores are disabled. the cores are there, but not available, so we would prefer count the number of cores actually available, when deciding how many threads to use, and make achitectual choices based on the best available core
#238 is for nnpack? Did you mean xnnpack? The change looks okay.
NNPACK is correct, that's where I ran into a crash due to this. See the issue there: https://github.com/Maratyszcza/NNPACK/issues/218
This PR caused crashes on Android and Linux platforms https://github.com/pytorch/cpuinfo/issues/339
So I've prepared a PR to rollback to the previous version https://github.com/pytorch/cpuinfo/pull/340
Are you sure this is caused by this PR? I can't see how.
Stacktrace:
#00 pc 0910812b cpuinfo_compute_max_cache_size+37
#01 pc 091069a7 cpuinfo_x86_linux_init+4504
This leads to this call: https://github.com/Flamefire/cpuinfo/blob/b9558e923a0f915ddc2357116abc0d6aaaaea4e8/src/x86/linux/init.c#L638 and then this function: https://github.com/pytorch/cpuinfo/blob/735b2ac12f69aaf408784ed251b9bd6319cfc62d/src/cache.c#L6
There can only be a segfault if processor is invalid, which it is clearly not in this case. Or if one of the cache pointers is pointing to a wrong address.
Those pointers are set to members of an array whose sizes are calculated by cpuinfo_x86_count_objects using the same logic as that code that sets the pointers.
Additionally those pointers are assigned to in cpuinfo_x86_linux_init already so if writing to those doesn't crash, how could reading them?
Any idea what actually causes the crash on Android?
Because this code works for me on multiple (non-Android) systems.