metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Use `os.sched_getaffinity` instead of `os.cpu_count` where possible

Open bryant1410 opened this issue 11 months ago • 9 comments

When automatically choosing the number of parallel workers, os.sched_getaffinity is a better choice than the currently used os.cpu_count. The former uses a process' assigned CPU count. See this Stack Overflow answer for an explanation.

I changed this codebase to first check os.sched_getaffinity and otherwise default to os.cpu_count (and then default to 1; as the latter could potentially be None). As some form of validation, this is something PyTorch uses as well.

In the (rare) case that os.sched_getaffinity isn't defined, I make it default to os.cpu_count. PyTorch's code behaves differently by using the value 0. I think using 0 doesn't make sense. Still, this shouldn't happen as I was reading that in Linux you have to assign at least one.

bryant1410 avatar Dec 04 '24 20:12 bryant1410