catkin_tools
catkin_tools copied to clipboard
Honor CPU affinity at parallel builds
System Info
- Operating System:
Linux tiago-surface-ubuntu 4.15.0-45-generic #48~16.04.1-Ubuntu SMP Tue Jan 29 18:03:48 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Python Version:
Python 2.7.12
- Version of catkin_tools:
catkin_tools 0.4.4 (C) 2014-2019 Open Source Robotics Foundation
catkin_tools is released under the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
---
Using Python 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609]
- ROS Distro:
kinetic
Build / Run Issue
- [ ] Works with
catkin_make
- [ ] Works with
catkin_make_isolated --merge
- [ ] Works with
catkin build
- [ ] Works with
catkin build -p1
I'm building on CircleCI with 2 CPUs allocated for my container. By default, catkin build
will use 32 processes instead of 2 and the compilation runs out of memory.
The problem is that catkin_tools
is checking for CPUs available on the host and not checking the CPU affinity mask. When using containers and limiting the container to a subset of the CPUs, catkin_tools
should only run on the subset available for it.
The issue is not exclusive to catkin_tools
, as documented by CircleCI (https://circleci.com/docs/2.0/configuration-reference/#resource_class):
Java, Erlang and any other languages that introspect the /proc directory for information about CPU count may require additional configuration to prevent them from slowing down when using the CircleCI 2.0 resource class feature. Programs with this issue may request 32 CPU cores and run slower than they would when requesting one core.
Expected Behavior
On builds with a CPU affinity mask, respect the affinity mask when counting CPU cores.
Actual Behavior
Build runs on all CPUs.
Steps to Reproduce the Issue
- Take a docker image with ROS (I'm using
automni/rhino
at work) - Run
docker run --cpuset-cpus=0 --entrypoint bash -ti automni/rhino
, which pins the container to the first CPU - Building inside the container uses
-jN -lN
, whereN
is the number of CPUs on the host, instead of 1
Solution
Python >= 3.3 has a native solution (https://docs.python.org/3.4/library/os.html#os.sched_getaffinity - related Python issue). After a quick search I couldn't find anything for 2.7, but it could be achieved with ctypes mimicking these C calls: https://github.com/coreutils/gnulib/blob/master/lib/nproc.c#L69
I will take another look for possible solutions after I leave work. In the meanwhile, let me know if anyone has thoughts on this.