default_thread_allocations should not use std::thread::hardware_concurrency()
Hello,
In the default_thread_allocations function, C++ hardware_concurrency is used to determine the maximum number of threads.
https://github.com/nanoporetech/dorado/blob/9dc15a8564d0bbe6cff710770b13c14816a2e3a7/dorado/utils/parameters.cpp#L13
std::thread::hardware_concurrency() returns, when possible, the underlying hardware capability to run threads, which might not corresponds to the actual number of cores available to the process (through the use of taskset, batch system like slurm, etc...). The consequence is that the program might run in a non optimal way. For example, if I run taskset -c 1 on my 20 cores machines, it will run 20 workers on only one core.
Consider this code run through the slurm scheduler using srun, requesting 4 cores:
#include <iostream>
#include <sched.h>
#include <thread>
int main (void) {
cpu_set_t cpu_set;
int cores = std::thread::hardware_concurrency();
std::cout << "using std::thread::hardware_concurrency: " << cores << "\n";
sched_getaffinity(0, sizeof(cpu_set), &cpu_set);
cores = CPU_COUNT(&cpu_set);
std::cout << "using sched_getaffinity: " << cores << "\n";
return 0;
}
$ srun -c 4 ./get_cores
srun: job 13694693 queued and waiting for resources
srun: job 13694693 has been allocated resources
using std::thread::hardware_concurrency: 96
using sched_getaffinity: 4
I think a more sane behavior would be to use sched_getaffinity as in the folly library for example (folly/folly/system/HardwareConcurrency.cpp)
#include <folly/system/HardwareConcurrency.h>
#include <thread>
#include <folly/portability/Sched.h>
namespace folly {
unsigned int hardware_concurrency() noexcept {
#if defined(__linux__) && !defined(__ANDROID__)
cpu_set_t cpuset;
if (!sched_getaffinity(0, sizeof(cpuset), &cpuset)) {
auto count = CPU_COUNT(&cpuset);
if (count != 0) {
return count;
}
}
#endif
return std::thread::hardware_concurrency();
}
} // namespace folly
What do you think ?
Thank you for considering this change !
Hi @jbd,
Thanks for the info - this certainly looks like a worthwhile change for users who take more active control over their CPU resources. We'll look into it for a future release.