[make] Fix parallel builds (by only building libc in parallel, for now)
I like using the cores of my laptop's processor. Unfortunately, the kernel Makefiles are not parallel-safe, and I don't trust elkscmd/Makefile to parallelize well in all situations.
This patch makes ELKS build correctly with export MAKEFLAGS="-jN" for N > 1. It also allows building the libcs in parallel, which already speeds up the build a little for multi-core CPU users, and will translate to slight speed ups for GitHub CI, as well.
Future work would include validating elkscmd/Makefile for proper parallel operation and patching the kernel Makefiles for the same.
I'm in agreement with you that I don't want to trust elkscmd/ makes to work in parallel, and the kernel doesn't either. IIRC, this was last changed with a contribution by @hexadec1mal, but I don't actually run parallel makes on macOS so I wasn't able to test it.
Isn't there another (better) way of doing this rather than having to change so many Makefiles to use make ... -j1 ...? I'm all for speeding things up, but why not just change the make in libc/Makefile rather than everywhere else, until we figure a better way to do this for everything?
Change the make in libc/Makefile to what? It's the user who passes the number of concurrent jobs, typically in the environment variable MAKEFLAGS which is read by make, and then command line arguments can override it. Hardcoding it to something like -j2 or -j $(nproc) is not a good idea, as some people may deliberately want to use a higher (slow I/O) or lower (heavily multi-tasking machine) number of jobs, and I think introducing a custom variable like PARALLEL goes against common practice.
Unfortunate as it may be, I think passing -j1 to every make invocation that doesn't work in parallel is the common practice - at least, that's what I've seen used in things like package manager build scripts. The idea I had in mind was to fix parallel makes in general first (even if most of the build is still serial), then slowly remove -j1 from various other components as they are tested to build correctly in parallel (either by fixing the issues, or by moving -j1 further down the stream of make invocations).
Change the make in libc/Makefile to what?
I'm not sure - I haven't looked at it much lately, and it's kinda complicated. You may know that cd libc; make doesn't actually work - you have to cd $(TOPDIR); make if any code is changed in libc/, else the gcc-specific sublevel libc's don't update. So its messy when updating libc code.
I was thinking of something simpler than -j1 everywhere... how about perhaps storing the -j1 in the $(MAKE) variable, so that for almost everywhere it automatically defaults to -j1, and then (somehow) overriding MAKE= in libc/ only. It seems that in elks-root/, elks/ and elkscmd/, MAKE could/should be set to -j1 (as part of Makefile-rules?) and then change it all later when we finally get parallel makes working. (Or the reverse, assume non-parallel everywhere except libc).
which already speeds up the build a little for multi-core CPU users, and will translate to slight speed ups for GitHub CI
How much faster does this actually make libc and/or other parts of the build that actually use parallel?
I'm not against speeding things up, just trying to find/engineer something that isn't so ugly, and doesn't default all make's everywhere to have a -j1 added. If the speedups aren't significant, perhaps better to just fix the parallel make "problem" by undoing it. I'm not even sure libc parallel make is proven correct, TBH.
How much faster does this actually make libc and/or other parts of the build that actually use parallel?
$ time make -j1
[...]
real 0m55.452s
user 0m42.314s
sys 0m12.953s
$ time make -j12
[...]
real 0m35.097s
user 0m55.712s
sys 0m16.501s
This is with just libc parallelized; it really does take up a lot of time to build all its variants, and because each variant's code is self-contained in its own build directory, I expect it to not cause too many issues. (Of course, more testing is welcome)
just trying to find/engineer something that isn't so ugly, and doesn't default all make's everywhere to have a -j1 added.
I suppose one could MAKEP=$(MAKE) and MAKE=$(MAKEP) -j1, and then use $(MAKEP) for the parallel parts? Something in that direction.
After reading all this three times, I still can't figure out what the problem is at all. It compiles fine no matter the order and amount of parallel processes.
time make -j --shuffle=reverse
real 0m28.155s
user 1m48.414s
sys 0m20.090s
./build.sh auto allimages
Build script has completed successfully.
Maybe that's a problem with a particular config? May you post the config here?
Try swan.config in the repository.
Sometimes, I would instead get errors about compiler-generated.h being missing.
diff --git a/elks/Makefile b/elks/Makefile
index 8f0c2ff6..5c41cb1c 100644
--- a/elks/Makefile
+++ b/elks/Makefile
@@ -119,8 +119,6 @@ net/net.a:
tools:
${MAKE} -C tools all
- -rm -f include/linuxmt/compiler-generated.h
- -rm -f kernel/version.o
#########################################################################
# Compiler-generated definitions not given as command arguments.
No ideas why this was here. kernel/kernel.a target will create compiler-generated.h and then tools target compiling in parallel thread will delete it and version.c won't compile with error compiler-generated.h missing. Or delete version.o after it was compiled: ia16-elf-ar: version.o: No such file or directory as in build.log
diff --git a/image/Make.image b/image/Make.image
index 01352bef..8d76e0e5 100644
--- a/image/Make.image
+++ b/image/Make.image
@@ -150,7 +150,7 @@ romfs: template
mkromfs -d romfs.devices $(DESTDIR)
-rm -f romfs.devices
-swanrom:
+swanrom: romfs
rm -f $(TARGET_FILE)
cat $(IMG_DIR)/romfs.bin > $(TARGET_FILE)
truncate -s $(TARGET_ROMFS_BYTES) $(TARGET_FILE)
romfs.bin is generated by romfs target
Also, why copy with cat command
Also, why rm is prefixed with '-'
Also, rom.wsc is not in gitignore ?? image/rom.wsc
@asiekierka
Sometimes, I would instead get errors about compiler-generated.h being missing.
Yes, that file as well as asm-offsets.h are prone to races since they're generated anew every build. I'm not a big fan of it. And elks/tools needs to be built before everything else, but might be easily parallelized.
Future work would include validating elkscmd/Makefile for proper parallel operation and patching the kernel Makefiles for the same.
Perhaps we should adjust the kernel build for parallel operation now, in order to move this PR forward? The C library takes the longest to build by quite a bit - I almost never rebuild it and use make kclean; make kimage whcih speeds things up a lot.
As you mention, elkscmd/ is risky since someone's got to look though each of the app Makefiles. That should be fairly straightforward to cleanup using a few -j1's, with a quick look at some of the more complex Makefiles. (Ash comes to mind).
Agreed that something needs to be done so that the Swan build actually works without error.
I suppose one could MAKEP=$(MAKE) and MAKE=$(MAKEP) -j1, and then use $(MAKEP) for the parallel parts? Something in that direction.
I like that direction rather than using -j1 everywhere. If the parallel problems are cleaned up, then perhaps we don't need many -j1's after all.