arduino-builder icon indicating copy to clipboard operation
arduino-builder copied to clipboard

Parallel build uses only 1 core when compiling regular libraries

Open PaulStoffregen opened this issue 6 years ago • 8 comments

Wow, the parallel build in arduino-PR-beta1.9-BUILD-22 makes an amazing speedup for compiling the core library.

But compiling normal libraries doesn't seem to do anything in parallel. Is that intentional? Hopefully parallel build can be applied to regular libraries. Some larger libraries have a tremendous amount of code which could really benefit.

PaulStoffregen avatar Oct 24 '17 16:10 PaulStoffregen

Hi Paul, this is not intentional :smile: I'll take a closer look at which part is preventing this to happen. Thanks for spotting this !

facchinm avatar Oct 26 '17 20:10 facchinm

I just tested with Wifi101 examples and it seems to parallelize also the library compilation. The logic is a bit convoluted (the workqueue is library and extension wise, so it will only parallelize the build of the same filetype on the same library. In case of larger libraries the speedup should be noticeable too; did you perform any benchmark on a particular sketch/library combination?

facchinm avatar Oct 30 '17 17:10 facchinm

I tested with my audio library, which has 56 .cpp files. Running on Linux 64 bit, process monitor shows only ~1 CPU used. During the core library build, it shows about 10, but difficult to tell for sure because it does by so quickly.

PaulStoffregen avatar Oct 30 '17 18:10 PaulStoffregen

I know this is an old bug, but shows up on google for parallel builds. What's the current status on doing parallel builds, does it work with arduino 1.8.9? or 1.9.x is required?

marcmerlin avatar Mar 03 '20 16:03 marcmerlin

As of today, all source files of a "submodule" are compiled in parallel with N threads (N = core number), while all submodules are compiled in order. A submodule can be the core, a sketch or a library. Unless the library contains a lot of files the speedup is not noticeable. We didn't invest any time yet trying to "unlock" the fully parallelized build (out of order submodules so N libraries can be compiled in parallel) mostly for the added complexity in logging and error reporting.

facchinm avatar Mar 10 '20 09:03 facchinm

@facchinm thanks for the update. I've indeed noticed build times of 1mn or sometimes more on ESP32 code with lots of source files and libraries. It's vexing that my quad core computer indeed seemed to compile everything in such a serial fashion. Caching helps, but I often have to change compile options (PSRAM or not, or other such things), which blow the whole cache and force the full and lengthy rebuild. Honestly, I'm not that interested in errors in libraries, they are rare, I really wish it would all build with -j8 given that my CPU has 8 threads. Maybe I should check if platformio does this better. Thanks for the answer, I appreciate it.

marcmerlin avatar Mar 10 '20 16:03 marcmerlin

@marcmerlin if you have a large core (like ESP32) and a lot of big libraries (like the ESP32 wifi ones) you should cap the 8 threads in almost every condition also with our approach. The problem arises if you have 100 libraries composed by 1 file each, which compiles as if -j1 was specified.

facchinm avatar Mar 10 '20 16:03 facchinm

@facchinm I guess I have both then (some slightly bigger libraries, but also many libraries). Maybe it's already close to going as fast as it can, it's hard to compare given that I don't have a button to click for "full parallelized across everything". I have no idea how easy/difficult it would be to add that option (more speed vs better debug), but I would sure select it if it were there to see whether the speedup is worth the slightly harder to read build output

marcmerlin avatar Mar 10 '20 16:03 marcmerlin