ocrd_all
ocrd_all copied to clipboard
Parallel build does not make full use of available CPU resources
make all -j6
starts with downloading all submodules, 6 at a time, but because of the semaphore required for git all downloads happen sequentially. Builds will only start as soon as there remain less than 6 submodules to download.
The first build for a freshly cloned ocrd_all requires about 16 minutes of CPU time. With 6 CPUs, it should run in less than 3 minutes, but it takes more than twice of that:
# time nohup make all -j6
real 6m24.406s
user 15m47.172s
sys 1m8.953s
In this test, the pip cache was filled from previous runs of the same user.
make all -j6
starts with downloading all submodules, 6 at a time, but because of the semaphore required for git all downloads happen sequentially.
Yes, that is unfortunate. But maybe we overreacted when fixing #123: there are lock files involved in both sync
and update
, but only the former is shared by all submodules (.git/config.lock
), while the latter uses submodule-localized locks (.git/modules/MODULE/{index,HEAD,...}.lock
). So we could actually remove the second SEM
which accounts for most of the download time!
Builds will only start as soon as there remain less than 6 submodules to download.
If you don't restrict to a fixed number of jobs, but a relative number of load level (e.g. -j -l 6
), this behaviour will be better!
Using make all -j -l 6
seems to create a large overhead with slightly faster build:
# time nohup make all -j -l6
real 6m3.837s
user 19m42.264s
sys 1m40.864s
I had removed the second git SEM
for this test.
My recent experience is, when doing make all -j 8
not all is done, because make all
builds a lot afterwards.