AliceVision icon indicating copy to clipboard operation
AliceVision copied to clipboard

Find maximum number of CPUs given within a cgroup

Open tomgreen66 opened this issue 5 months ago • 0 comments

On a HPC cluster environment, CPUs can be given to it via job schedulers, such as Slurm, using cgroups to provide a cpuset to a job. Currently Alicevision (and hence Meshroom) will report the maximum number of processors on the compute node, whilst the job scheduler may have own given access to a limited set of CPUs. Therefore the user has to remember to limit the number of CPUs to what Slurm has given it.

Would it be worth changing the code for get_total_cpus at:

https://github.com/alicevision/AliceVision/blob/3a0be0fef01d19a0bb1cb5054f23be6591de0301/src/aliceVision/system/cpu.cpp#L133-L143

to be instead use:

#ifndef _GNU_SOURCE
# define _GNU_SOURCE
#endif
#include <sched.h>
int get_total_cpus()
{
 cpu_set_t cs;
 sched_getaffinity(0, sizeof(cs), &cs);
 return CPU_COUNT_S(sizeof(cs), &cs);
}

This should return the actual CPUs which are available to the software rather than the total maximum on the node. This may need updates to Cmake to test for existence of sched_getaffinity so fallback to current method can be used, something like:

list(APPEND CMAKE_REQUIRED_DEFINITIONS -D_GNU_SOURCE)
CHECK_SYMBOL_EXISTS(sched_getaffinity sched.h HAVE_SCHED_GETAFFINITY)
list(REMOVE_ITEM CMAKE_REQUIRED_DEFINITIONS -D_GNU_SOURCE)

tomgreen66 avatar Jan 16 '24 10:01 tomgreen66