cpuminer icon indicating copy to clipboard operation
cpuminer copied to clipboard

On Intel CPUs, optionally default to one thread per core?

Open jepler opened this issue 10 years ago • 6 comments

I have noticed that on Intel CPUs with HT that actually using one thread per "processor" does not have higher performance than with one thread per core. (tested systems: Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz, Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz) This is entirely expected, because these systems do not have enough L2 cache for two scrypt instances to fit.

On my two systems, I found that the i7 had the first thread of a given core at CPU affinity numbers 0, 2, 4, ..., 10; my i5 has the first thread of a given core at CPU affinity numbers 0 and 1, so there's not a simple "one size fits all" approach to scheduling on cores.

hwloc provides an API that allows enumeration of system resources, including enumerating cores distinctly from "PU"s. It reportedly works on a variety of Unix systems; Debian stable has version 1.4.1. It should allow the necessary translation from core IDs to CPU affinity numbers so that the best affinity can be chosen.

All that said, using all the threads isn't actually a performance hit (I haven't measured power consumption), so this is at best a low priority item.

jepler avatar Mar 18 '14 14:03 jepler

FWIW, I actually have best performance when running number of physical cores + qty of cpu's. It does seem to be able to take advantage of some HT, but not to extreme levels.

So for example on my 2 x xeon X5560, I run 10 threads for optimal performance.

jentfoo avatar Mar 18 '14 14:03 jentfoo

On Intel CPUs with hyper-threading it is not always obvious to tell what the optimal number of miner threads is. It probably depends on the cache sizes as well as on other factors. That said, in my experience running one thread per logical core (which is the current default setting) usually gives the best results. This is for instance what I get on my dual-core Haswell with HT:

2 threads: 34.0 kH/s on scrypt, 19.3 MH/s on sha256d 3 threads: 36.4 kH/s on scrypt, 19.9 MH/s on sha256d 4 threads: 38.0 kH/s on scrypt, 20.5 MH/s on sha256d

Even on CPUs on which that is not the case, running one thread per logical core does not normally result in a performance hit (as jepler already mentioned).

pooler avatar Mar 18 '14 15:03 pooler

On my machine I got more performance when set threads to number of cores plus one thread. E.g. on dual core machine -- 3 threads, on quad -- 5.

a1batross avatar Mar 19 '14 13:03 a1batross

Hello. In looking at the perf on this on a ryzen (16 cores if you include HT) I found best perf (in hash/s) to be staggering every other core. Core 0 in Windows specifically is always used for network, so may be best to avoid 0...

Anyway I see higher hash/s doing a 1/3/5/7/9/11/13/15 (not a lot, maybe 1, but it adds up over time, I guess). Anyway thanks for your code here. take my thoughts with a bit of salting

jeffstokes72 avatar May 27 '17 18:05 jeffstokes72

It'd also be very cool to have a switch for priority level, in case we want to run this below normal but above idle.

jeffstokes72 avatar May 28 '17 17:05 jeffstokes72

If i have 4 logical cores, it seems running 4 thread is the default. Is there any advantage to running higher thread counts such as 16 or 32? I'm running on an i3-350m 2.26GHz with 32 threads. My Windows 10 Power Plan is on power saver, but I'm getting more accepted in a 10 minute period than when I was running 4 threads. Am I onto something here, or is this stats gone wild on a slow laptop?

fogoat avatar Aug 31 '17 03:08 fogoat