catkin_tools icon indicating copy to clipboard operation
catkin_tools copied to clipboard

Honor CPU affinity at parallel builds

Open tiagoshibata opened this issue 5 years ago • 1 comments

System Info

  • Operating System: Linux tiago-surface-ubuntu 4.15.0-45-generic #48~16.04.1-Ubuntu SMP Tue Jan 29 18:03:48 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Python Version: Python 2.7.12
  • Version of catkin_tools:
catkin_tools 0.4.4 (C) 2014-2019 Open Source Robotics Foundation
catkin_tools is released under the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
---
Using Python 2.7.12 (default, Nov 12 2018, 14:36:49) [GCC 5.4.0 20160609]
  • ROS Distro: kinetic

Build / Run Issue

  • [ ] Works with catkin_make
  • [ ] Works with catkin_make_isolated --merge
  • [ ] Works with catkin build
  • [ ] Works with catkin build -p1

I'm building on CircleCI with 2 CPUs allocated for my container. By default, catkin build will use 32 processes instead of 2 and the compilation runs out of memory.

The problem is that catkin_tools is checking for CPUs available on the host and not checking the CPU affinity mask. When using containers and limiting the container to a subset of the CPUs, catkin_tools should only run on the subset available for it.

The issue is not exclusive to catkin_tools, as documented by CircleCI (https://circleci.com/docs/2.0/configuration-reference/#resource_class):

Java, Erlang and any other languages that introspect the /proc directory for information about CPU count may require additional configuration to prevent them from slowing down when using the CircleCI 2.0 resource class feature. Programs with this issue may request 32 CPU cores and run slower than they would when requesting one core.

Expected Behavior

On builds with a CPU affinity mask, respect the affinity mask when counting CPU cores.

Actual Behavior

Build runs on all CPUs.

Steps to Reproduce the Issue

  • Take a docker image with ROS (I'm using automni/rhino at work)
  • Run docker run --cpuset-cpus=0 --entrypoint bash -ti automni/rhino, which pins the container to the first CPU
  • Building inside the container uses -jN -lN, where N is the number of CPUs on the host, instead of 1

Solution

Python >= 3.3 has a native solution (https://docs.python.org/3.4/library/os.html#os.sched_getaffinity - related Python issue). After a quick search I couldn't find anything for 2.7, but it could be achieved with ctypes mimicking these C calls: https://github.com/coreutils/gnulib/blob/master/lib/nproc.c#L69

tiagoshibata avatar Feb 26 '19 21:02 tiagoshibata

I will take another look for possible solutions after I leave work. In the meanwhile, let me know if anyone has thoughts on this.

tiagoshibata avatar Feb 26 '19 21:02 tiagoshibata