john icon indicating copy to clipboard operation
john copied to clipboard

Set OpenMP CPU affinity by default

Open solardiz opened this issue 2 years ago • 7 comments

In some of my tests (especially of the memory-hard formats on multi-socket/NUMA systems but not only there and not only of those), setting GOMP_CPU_AFFINITY to cover the full range of logical CPUs improves performance (sometimes a lot, e.g. by 77% in an scrypt benchmark I ran the other day). Maybe we should have john itself do that (early enough that threads are not started yet), except when that env var is already set (even if to an empty string), OMP_NUM_THREADS is set, or/and --fork is used.

For --fork in combination with OpenMP (where we already reduce the per-process thread count accordingly), we would need to use different CPU number ranges for the different processes. We may, but let's start with the simpler change first. (In fact, this need for special handling of --fork is a reason why I don't just configure GOMP_CPU_AFFINITY on my systems globally, so it's also a reason to have it built into john, where we can have it conditional.)

A drawback of all/any of these changes is that performance could suffer when the system is under significant load by something else running on it (tasks, kernel threads, interrupt handlers, etc.) that also uses strict CPU affinity. In such cases, performance would have been unoptimal anyway, but it could become worse yet. A workaround, desirable in such cases anyway, would be for the user to set GOMP_CPU_AFFINITY differently or/and to set OMP_NUM_THREADS such that competition for the CPUs used by the other load is avoided or reduced.

solardiz avatar Apr 30 '23 18:04 solardiz

The case of running efficiently on less than all cores is often interesting (eg. you have lots of cores and want to run several john processes) - but it's harder to get "right". First of all because the very meaning of "right" varies: Perhaps you do want two threads on each core, perhaps you do not. And while some CPU's (the Intels I've looked at) have the second set of threads up high, some others have them as odd figures. So to only use one thread per core you'd need to use eg. GOMP_CPU_AFFINITY=0,2,4,6 rather than GOMP_CPU_AFFINITY=0-3 on an eightcore+HT.

I only found out today you get a good view of it with lscpu --all --extended (or even grep -E '^(processor|core id)' /proc/cpuinfo but the latter use a confusing nomenclature). And BTW I also found out in Linux you can turn HT on or off on the fly using sudo tee /sys/devices/system/cpu/smt/control <<< on (or off) which was terrific for proving some of my performance tests did what I expected.

Anyway, these cases are probably unfeasible within john.

magnumripper avatar Oct 17 '24 19:10 magnumripper

On another note I think supporting this (as described in OP) is a good idea but there would obviously need to be some conf setting for turning it off (or even on, depending on what default we decide on).

magnumripper avatar Oct 17 '24 19:10 magnumripper

BTW It also appeared today that GOMP_CPU_AFFINITY set a stronger affinity than I thought - I suspected it could be more of a loose hint. However, an already running process will behave fine if affined cores are turned off: I saw kernel messages such as "process 22507 (john) no longer affine to cpu4" and then there were only the expected performance drop.

magnumripper avatar Oct 17 '24 19:10 magnumripper

I think supporting this (as described in OP) is a good idea but there would obviously need to be some conf setting for turning it off

My suggested ways/conditions to turn it off are "when that env var is already set (even if to an empty string), OMP_NUM_THREADS is set, or/and --fork is used." These would fit your examples with "running efficiently on less than all cores", where like you correctly say the corresponding affinity settings would vary by CPU topology and user preference.

you get a good view of it with lscpu --all --extended (or even grep -E '^(processor|core id)' /proc/cpuinfo but the latter use a confusing nomenclature).

Yes, and I previously wrote a parser for the latter, attached here - https://www.openwall.com/lists/oss-security/2018/06/21/7

This info is also available via sysfs (differently), which may be a more official way for programs to obtain it. But I chose /proc/cpuinfo back then not to rely on sysfs even being mounted.

Anyway, these cases are probably unfeasible within john.

With the above parser, on Linux we can actually have high-level e.g. john.conf or command-line settings like "[don't] use SMT". We could also use this to optimize Argon2 and (ye)scrypt performance on some CPUs at some sizes by reducing L3 cache thrashing (don't let two SMT threads into the second half of memory filling at the same time - an approach I had successfully tested in defensive password hashing setups).

in Linux you can turn HT on or off on the fly using sudo tee /sys/devices/system/cpu/smt/control <<< on (or off)

Yeah, I've been using it on some servers to switch balance between performance and side-channel security. When server load is low anyway, can mitigate more risks.

I also noticed this doesn't appear to work anymore when SMT was disabled at boot time as Qubes OS default.

solardiz avatar Oct 17 '24 19:10 solardiz

This info is also available via sysfs (differently), which may be a more official way for programs to obtain it.

... and this is discussed further in the oss-security thread I referenced, in particular that my parser won't work right on POWER, but a sysfs approach could work across arches.

solardiz avatar Oct 17 '24 19:10 solardiz

Another data point is "efficiency cores" vs. "performance cores", which most(?) new CPUs have. This might make it harder for us when deciding which cores we want to pin to.

magnumripper avatar Nov 29 '24 02:11 magnumripper

Optimal affinity settings may also need to consider (even) distribution across physical sockets, as seen in https://github.com/openwall/john/issues/5668#issuecomment-2790064384 - obviously, relevant only when using fewer than max hw threads.

solardiz avatar Apr 20 '25 02:04 solardiz