LibreELEC.tv
LibreELEC.tv copied to clipboard
ccache: update to 4.6.3 and now builds with cmake
ccache
: update to 4.6.3 and now builds with cmake
Release notes:
- https://ccache.dev/releasenotes.html#_ccache_4_0
ccache
now uses zstd
.
This PR is complete - but in test and needs testing across a variety of situations.
DONE:
- review / change
zstd:host
build to usecmake
ormake
, as the dependencies formeson
andninja
have not been built at this stage of themake image
- ~The PR currently depends on
sudo apt install -y libzstd-dev
on the build_host.~- zstd:host package now builds without ccache.
Performance
Disk utilisation and performance testing as below. make image
ccache - 4.6.3
First build 3.9GB
Second build 5.0GB
Third build 6.1GB
Total Build Time: 02:09:15.821 (wall clock)
Accum Build Time: 17:14:06.469 (8 slots)
Total Build Time: 00:24:31.422 (wall clock)
Accum Build Time: 03:16:11.276 (8 slots)
Total Build Time: 00:24:17.381 (wall clock)
Accum Build Time: 03:14:18.950 (8 slots)
ccache - 3.7.12
First build 7.66GB
Second build 9.10GB
Third build 8.65GB
Forth build 8.55GB
Total Build Time: 02:01:01.230 (wall clock)
Accum Build Time: 16:08:09.743 (8 slots)
Total Build Time: 00:22:33.545 (wall clock)
Accum Build Time: 03:00:28.263 (8 slots)
Total Build Time: 00:24:21.433 (wall clock)
Accum Build Time: 03:14:51.361 (8 slots)
Total Build Time: 00:23:51.413 (wall clock)
Accum Build Time: 03:10:51.208 (8 slots)
Other updates
With the build dependency on cmake
an update has been made to the cmake
dependencies (that being: gettext
and pkg-config
to build using the LOCAL_CC compiler and without the dependency on ccache:host
.
-
zlib build
needed the addition ofccache:host
with the reordering of dependencies -
elfutils
build needed the addition ofm4:host
with the reordering of dependencies
Package changes:
- elfutils: host build depends on m4:host
- pkg-config: remove host dependancy of ccache:host
- gettext: remove host dependancy of ccache:host
- cmake: remove host dependancy of ccache:host
- zlib: host build depends on ccache:host
-
ccache: update to 4.6.3 and now builds with cmake
- ccache: add zstd:host dependency
- zstd: build with cmake-make and remove host dependancy of ccache:host
Using -3 the stats are:
Total Build Time: 02:14:46.500 (wall clock)
Accum Build Time: 17:58:11.889 (8 slots)
Total Build Time: 00:25:53.479 (wall clock)
Accum Build Time: 03:27:07.730 (8 slots)
4.24GB
5.36GB
bin/ccache -d build.LibreELEC-Generic.x86_64-11.0-devel/.ccache -s
Summary:
Hits: 73580 / 139824 (52.62 %)
Direct: 67132 / 148969 (45.06 %)
Preprocessed: 6448 / 78074 (8.26 %)
Misses: 66244
Direct: 81837
Preprocessed: 71626
Errors: 82
Uncacheable: 54451
Primary storage:
Hits: 142915 / 294149 (48.59 %)
Misses: 151234
Cache size (GB): 5.48 / 10.00 (54.79 %)
Use the -v/--verbose option for more details.
$ bin/ccache -d build.LibreELEC-Generic.x86_64-11.0-devel/.ccache -p | grep build
(environment) cache_dir = build.LibreELEC-Generic.x86_64-11.0-devel/.ccache
(build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/ccache.conf) compression_level = -3
(build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/ccache.conf) max_size = 10.0G
(default) temporary_dir = build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/tmp
Using -5 the stats are:
Total Build Time: 02:12:08.136 (wall clock)
Accum Build Time: 17:37:04.987 (8 slots)
Total Build Time: 00:25:04.302 (wall clock)
Accum Build Time: 03:20:34.285 (8 slots)
4.48GB
5.62GB
$ bin/ccache -d build.LibreELEC-Generic.x86_64-11.0-devel/.ccache -s
Summary:
Hits: 73542 / 139824 (52.60 %)
Direct: 67008 / 148969 (44.98 %)
Preprocessed: 6534 / 78198 (8.36 %)
Misses: 66282
Direct: 81961
Preprocessed: 71664
Errors: 82
Uncacheable: 54451
Primary storage:
Hits: [142710 / 294149](tel:142710 / 294149) (48.52 %)
Misses: 151439
Cache size (GB): 5.74 / 10.00 (57.44 %)
Use the -v/--verbose option for more details.
$ bin/ccache -d build.LibreELEC-Generic.x86_64-11.0-devel/.ccache -p | grep build
(environment) cache_dir = build.LibreELEC-Generic.x86_64-11.0-devel/.ccache
(build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/ccache.conf) compression_level = -5
(build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/ccache.conf) max_size = 10.0G
(default) temporary_dir = build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/tmp
I think we should try to lower the compression a bit (less cpu heavy), I guess this explains the rather big difference at the build times. Maybe to "-3" or "-5". https://ccache.dev/manual/4.6.3.html#config_compression_level
I think we should try to lower the compression a bit (less cpu heavy), I guess this explains the rather big difference at the build times. Maybe to "-3" or "-5". https://ccache.dev/manual/4.6.3.html#config_compression_level
Will test. As a point of reading the performance data. The first row in each is with no ccache.
so a full build (no ccache is 2h1m versus 2h9m) wall clock.
2/ I have just done the changes to zstd so no need for libzstd-dev now.
3/ I have tested the other package updates (without the ccache update and they are fine as-is) so a revert of ccache 4 to 3 and back to 4 can be done in isolation.
4/ I’ll kick off the -3 and -5 tests.
Tests don’t show any significant performance improvement using zstd -3, -5 versus 1. So I have defaulted CCACHE_COMPRESSLEVEL="0" —- which defaults zstd to 1.
Results from building without compression were 9.0GB and
Total Build Time: 02:13:14.015 (wall clock)
Accum Build Time: 17:45:52.007 (8 slots)
so the differences between 3.7.12 and 4.6.3 do not look to be compression related.
interesting stats by recompressing the .ccache (only took 30 seconds from 9GB to 3.7GB) - these size statistics match the previous test results.
$ time bin/ccache -d build.LibreELEC-Generic.x86_64-11.0-devel/.ccache -X 1
Recompressing... 100.0% [=======================================================================================]
Original data: 9.0 GB
Old compressed data: 9.0 GB (100.0% of original size)
- Compression ratio: 1.000 x (0.0% space savings)
New compressed data: 3.7 GB (41.5% of original size)
- Compression ratio: 2.410 x (58.5% space savings)
Size change: -5.3 GB
real 0m30.584s
user 0m19.726s
sys 0m39.353s
Retested on an idle nuc11 Intel Tiger Lake x86_64 (nuc11 i7-1165G7)
- 2 full run though - with rm -rf build*
in between.
CCACHE_DISABLE=1 make image > testh.1.out 2>&1
Total Build Time: 01:56:09.895 (wall clock)
Accum Build Time: 15:29:19.060 (8 slots)
Total Build Time: 01:54:26.785 (wall clock)
Accum Build Time: 15:15:34.181 (8 slots)
ccache 4.6.3
make image > testh.2.out 2>&1
Total Build Time: 02:09:33.691 (wall clock)
Accum Build Time: 17:16:29.427 (8 slots)
Total Build Time: 02:10:00.651 (wall clock)
Accum Build Time: 17:20:05.104 (8 slots)
ccache 3.7.12
git revert 36f00e572a # this reverts for 4.6.3 package to 3.7.12 (but all other packages stayed the same)
make image > testh.3.out 2>&1
Total Build Time: 02:05:52.168 (wall clock)
Accum Build Time: 16:46:57.241 (8 slots)
Total Build Time: 02:05:58.246 (wall clock)
Accum Build Time: 16:47:45.865 (8 slots)
no ccache
Breakdown by status (all slots):
Status Usage ( Pct ) Count State
ACTIVE 06:50:23.173 (44.2%) 796 busy
FAILED : :00.000 (00.0%) 0
GETPKG : :00.000 (00.0%) 0 busy
IDLE 08:34:17.287 (55.3%) 296
LOCKED :02:58.865 (00.3%) 902
MUTEX : :11.447 (00.0%) 7 busy
MUTEX/W : :07.068 (00.0%) 3 stall
STALLED : :10.569 (00.0%) 3 stall
UNLOCK :01:10.652 (00.1%) 902
-------------------------------------
TOTAL 15:29:19.060 ( 100%) 2909
Peak concurrency: 8 out of 8 slots
6 job slots were held in a "stall" state for 00:00:17.637
Slot usage (time in a "busy" state): | Concurrency breakdown ("busy"):
|
#Rank Slot Usage ( Pct ) | # of Slots Usage ( Pct )
#01 05 01:29:44.493 (09.7%) | 01 01:56:06.144 (12.5%)
#02 03 01:00:41.961 (06.5%) | 02 01:26:38.384 (09.3%)
#03 04 :59:01.686 (06.4%) | 03 :52:30.535 (05.7%)
#04 08 :46:42.757 (05.0%) | 04 :38:04.984 (04.1%)
#05 06 :43:21.396 (04.7%) | 05 :35:45.280 (03.8%)
#06 01 :40:33.473 (04.4%) | 06 :29:50.615 (03.2%)
#07 02 :38:42.676 (04.2%) | 07 :28:58.415 (03.1%)
#08 07 :31:46.177 (03.4%) | 08 :22:40.264 (02.4%)
-----------------------------------------+---------------------------------
TOTALS 06:50:34.620 (44.2%) 06:50:34.620 (44.2%)
Breakdown by status (all slots):
Status Usage ( Pct ) Count State
ACTIVE 06:35:19.420 (43.2%) 796 busy
FAILED : :00.000 (00.0%) 0
GETPKG : :00.000 (00.0%) 0 busy
IDLE 08:36:25.468 (56.4%) 296
LOCKED :02:09.894 (00.2%) 902
MUTEX : :09.186 (00.0%) 7 busy
MUTEX/W : :06.030 (00.0%) 3 stall
STALLED : :27.863 (00.1%) 3 stall
UNLOCK : :56.321 (00.1%) 902
-------------------------------------
TOTAL 15:15:34.181 ( 100%) 2909
Peak concurrency: 8 out of 8 slots
6 job slots were held in a "stall" state for 00:00:33.892
Slot usage (time in a "busy" state): | Concurrency breakdown ("busy"):
|
#Rank Slot Usage ( Pct ) | # of Slots Usage ( Pct )
#01 04 01:06:33.104 (07.3%) | 01 01:54:22.764 (12.5%)
#02 02 :59:42.543 (06.5%) | 02 01:25:06.992 (09.3%)
#03 01 :54:36.542 (06.0%) | 03 :50:43.530 (05.5%)
#04 07 :49:48.823 (05.4%) | 04 :36:15.407 (04.0%)
#05 08 :47:22.254 (05.2%) | 05 :34:08.238 (03.7%)
#06 03 :41:27.223 (04.5%) | 06 :27:23.990 (03.0%)
#07 06 :38:35.375 (04.2%) | 07 :26:25.938 (02.9%)
#08 05 :37:22.742 (04.1%) | 08 :21:01.748 (02.3%)
-----------------------------------------+---------------------------------
TOTALS 06:35:28.606 (43.2%) 06:35:28.606 (43.2%)
ccache 4.6.3
Breakdown by status (all slots):
Status Usage ( Pct ) Count State
ACTIVE 08:12:32.232 (47.5%) 796 busy
FAILED : :00.000 (00.0%) 0
GETPKG : :00.000 (00.0%) 0 busy
IDLE 08:59:43.013 (52.1%) 296
LOCKED :02:49.028 (00.3%) 902
MUTEX : :09.900 (00.0%) 7 busy
MUTEX/W : :07.454 (00.0%) 4 stall
STALLED : :01.303 (00.0%) 2 stall
UNLOCK :01:06.496 (00.1%) 902
-------------------------------------
TOTAL 17:16:29.427 ( 100%) 2909
Peak concurrency: 8 out of 8 slots
6 job slots were held in a "stall" state for 00:00:08.757
Slot usage (time in a "busy" state): | Concurrency breakdown ("busy"):
|
#Rank Slot Usage ( Pct ) | # of Slots Usage ( Pct )
#01 01 01:19:38.095 (07.7%) | 01 02:09:30.073 (12.5%)
#02 08 01:11:28.086 (06.9%) | 02 01:35:38.925 (09.2%)
#03 03 01:10:33.295 (06.8%) | 03 01:09:18.299 (06.7%)
#04 04 01:05:23.589 (06.3%) | 04 :47:44.777 (04.6%)
#05 07 :54:42.070 (05.3%) | 05 :46:05.190 (04.4%)
#06 02 :52:31.970 (05.1%) | 06 :37:59.698 (03.7%)
#07 06 :51:50.128 (05.0%) | 07 :36:31.404 (03.5%)
#08 05 :46:34.899 (04.5%) | 08 :29:53.767 (02.9%)
-----------------------------------------+---------------------------------
TOTALS 08:12:42.132 (47.5%) 08:12:42.132 (47.5%)
Breakdown by status (all slots):
Status Usage ( Pct ) Count State
ACTIVE 08:02:58.632 (46.4%) 796 busy
FAILED : :00.000 (00.0%) 0
GETPKG : :00.000 (00.0%) 0 busy
IDLE 09:12:04.915 (53.1%) 296
LOCKED :03:10.034 (00.3%) 902
MUTEX : :10.772 (00.0%) 7 busy
MUTEX/W : :06.128 (00.0%) 4 stall
STALLED : :23.802 (00.0%) 3 stall
UNLOCK :01:10.820 (00.1%) 902
-------------------------------------
TOTAL 17:20:05.104 ( 100%) 2910
Peak concurrency: 8 out of 8 slots
7 job slots were held in a "stall" state for 00:00:29.931
Slot usage (time in a "busy" state): | Concurrency breakdown ("busy"):
|
#Rank Slot Usage ( Pct ) | # of Slots Usage ( Pct )
#01 08 01:16:55.041 (07.4%) | 01 02:09:57.205 (12.5%)
#02 04 01:14:24.611 (07.2%) | 02 01:37:33.846 (09.4%)
#03 05 01:13:51.295 (07.1%) | 03 01:05:53.251 (06.3%)
#04 01 01:06:07.534 (06.4%) | 04 :45:08.217 (04.3%)
#05 06 :52:48.358 (05.1%) | 05 :43:38.575 (04.2%)
#06 03 :51:55.788 (05.0%) | 06 :36:30.984 (03.5%)
#07 07 :49:48.045 (04.8%) | 07 :35:32.742 (03.4%)
#08 02 :37:18.732 (03.6%) | 08 :28:54.585 (02.8%)
-----------------------------------------+---------------------------------
TOTALS 08:03:09.404 (46.5%) 08:03:09.404 (46.5%)
ccache 3.7.12
Breakdown by status (all slots):
Status Usage ( Pct ) Count State
ACTIVE 07:48:04.779 (46.5%) 796 busy
FAILED : :00.000 (00.0%) 0
GETPKG : :00.000 (00.0%) 0 busy
IDLE 08:55:06.520 (53.1%) 296
LOCKED :02:28.376 (00.2%) 902
MUTEX : :11.410 (00.0%) 7 busy
MUTEX/W : :05.176 (00.0%) 4 stall
STALLED : :01.108 (00.0%) 3 stall
UNLOCK : :59.872 (00.1%) 902
-------------------------------------
TOTAL 16:46:57.241 ( 100%) 2910
Peak concurrency: 8 out of 8 slots
7 job slots were held in a "stall" state for 00:00:06.284
Slot usage (time in a "busy" state): | Concurrency breakdown ("busy"):
|
#Rank Slot Usage ( Pct ) | # of Slots Usage ( Pct )
#01 07 01:22:08.980 (08.2%) | 01 02:05:48.348 (12.5%)
#02 08 01:13:51.542 (07.3%) | 02 01:33:17.611 (09.3%)
#03 03 01:13:14.792 (07.3%) | 03 01:05:24.108 (06.5%)
#04 01 01:08:57.305 (06.8%) | 04 :43:41.665 (04.3%)
#05 05 :50:10.910 (05.0%) | 05 :42:17.022 (04.2%)
#06 04 :45:42.455 (04.5%) | 06 :35:21.562 (03.5%)
#07 06 :39:23.660 (03.9%) | 07 :34:27.249 (03.4%)
#08 02 :34:46.545 (03.5%) | 08 :27:58.624 (02.8%)
-----------------------------------------+---------------------------------
TOTALS 07:48:16.189 (46.5%) 07:48:16.189 (46.5%)
Breakdown by status (all slots):
Status Usage ( Pct ) Count State
ACTIVE 07:46:04.403 (46.2%) 796 busy
FAILED : :00.000 (00.0%) 0
GETPKG : :00.000 (00.0%) 0 busy
IDLE 08:57:24.578 (53.3%) 296
LOCKED :02:48.172 (00.3%) 902
MUTEX : :06.622 (00.0%) 7 busy
MUTEX/W : :04.593 (00.0%) 4 stall
STALLED : :15.906 (00.0%) 3 stall
UNLOCK :01:01.591 (00.1%) 902
-------------------------------------
TOTAL 16:47:45.865 ( 100%) 2910
Peak concurrency: 8 out of 8 slots
7 job slots were held in a "stall" state for 00:00:20.499
Slot usage (time in a "busy" state): | Concurrency breakdown ("busy"):
|
#Rank Slot Usage ( Pct ) | # of Slots Usage ( Pct )
#01 05 01:38:42.620 (09.8%) | 01 02:05:54.497 (12.5%)
#02 07 01:18:16.659 (07.8%) | 02 01:37:18.775 (09.7%)
#03 01 01:05:36.977 (06.5%) | 03 01:04:39.683 (06.4%)
#04 03 :52:31.162 (05.2%) | 04 :42:01.857 (04.2%)
#05 08 :50:37.515 (05.0%) | 05 :40:56.579 (04.1%)
#06 06 :47:40.107 (04.7%) | 06 :34:19.575 (03.4%)
#07 04 :38:41.654 (03.8%) | 07 :33:23.191 (03.3%)
#08 02 :34:04.331 (03.4%) | 08 :27:36.869 (02.7%)
-----------------------------------------+---------------------------------
TOTALS 07:46:11.025 (46.3%) 07:46:11.025 (46.3%)
Summary of performance results:
build times are 8% longer between no ccache --> ccache 3.7.12 build times are 12% longer between no ccache --> ccache 4.6.3 build times are 4% longer between ccache 3.7.12 --> ccache 4.6.3
Added support discussion in ccache:
- https://github.com/ccache/ccache/discussions/1160
Different results on an ADL
Intel(R) Client Systems NUC12WSKi7/NUC12WSBi7
BIOS WSADL357.0085.2022.0718.1739 07/18/2022
CPU0: 12th Gen Intel(R) Core(TM) i7-1260P
ccache 4.6.3
Total Build Time: 01:18:41.884 (wall clock)
Accum Build Time: 20:36:42.191 (16 slots)
Total Build Time: 00:18:21.047 (wall clock)
Accum Build Time: 04:31:04.287 (16 slots)
ccache 3.7.12
Total Build Time: 01:17:41.557 (wall clock)
Accum Build Time: 20:21:23.283 (16 slots)
Total Build Time: 00:19:22.917 (wall clock)
Accum Build Time: 04:45:24.514 (16 slots)
I have this some time on hold and it is perfect time to get dust off. When package is bumped ccache is not used anymore because the absolute path is changed. So I was thinking about solution like this below. It sets CCACHE_BASEDIR environment variable to the package build folder which means that ccache is used even on package bumps. What do you think? On quick reading I don't see any drawbacks.
https://ccache.dev/manual/4.3.html#config_base_dir https://ccache.dev/manual/4.3.html#_compiling_in_different_directories
diff --git a/scripts/build b/scripts/build
index edc47ff20d..3c7b1272a1 100755
--- a/scripts/build
+++ b/scripts/build
@@ -221,6 +221,9 @@ mkdir -p "${PKG_BUILD}"
cd "${PKG_BUILD}"
+# share ccache between package bumps
+export CCACHE_BASEDIR=${PKG_BUILD}
+
if [ -f "${PKG_CONFIGURE_SCRIPT}" -o -f "${PKG_CMAKE_SCRIPT}" -o -f "${PKG_MESON_SCRIPT}" ]; then
case "${TARGET}" in
"target") PKG_REAL_BUILD="${PKG_BUILD}/.${TARGET_NAME}" ;;
Build on ADL - Single THREADCOUNT - I have not yet analysed the logs
ccache 3.7.12 + THREADCOUNT=1 make image
Total Build Time: 01:45:40.668 (wall clock) Accum Build Time: 01:45:40.668 (1 slots)
ccache 4.6.3 + THREADCOUNT=1 make image
Total Build Time: 01:47:25.674 (wall clock) Accum Build Time: 01:47:25.674 (1 slots)
the history and job logs from these runs are at:
- https://github.com/LibreELEC/LibreELEC.tv/commit/2c6b0a40d1507f584b1a5a37f0b2d97bfa93a5d7
Testing RPi4 builds I noticed that clean rebuilds (with ccache intact) are now significantly slower, rebuild time went up from about 22-25 minutes to about 40-45 minutes (ballpark numbers, sometimes during test I had CCACHE_DEBUG enabled).
ccache stats also reported a lot more uncacheable files and a lot fewer cache hits- here are the stats from the second rebuild
Summary:
Cache directory: /home/hias/ccache-master
Primary config: /home/hias/ccache-master/ccache.conf
Secondary config: /home/hias/private/libreelec/libreelec-master/build.LibreELEC-RPi4.arm-11.0-devel/toolchain/etc/ccache.conf
Stats updated: Sun Sep 18 18:00:10 2022
Hits: 58018 / 58220 (99.65 %)
Direct: 56123 / 61933 (90.62 %)
Preprocessed: 1895 / 4604 (41.16 %)
Misses: 202
Direct: 5810
Preprocessed: 2709
Errors: 19
Uncacheable: 30426
Primary storage:
Hits: 114899 / 122641 (93.69 %)
Misses: 7742
Cache size (GB): 5.25 / 40.00 (13.12 %)
Files: 104992
Errors:
Internal error: 19
Uncacheable:
Autoconf compile/link: 9822
Bad compiler arguments: 719
Called for linking: 5435
Called for preprocessing: 1392
Compilation failed: 2379
Compiler produced empty output: 106
Compiler produced no output: 3
Could not use precompiled header: 3
Multiple source files: 3
No input file: 4305
Output to stdout: 7
Preprocessing failed: 1328
Unsupported code directive: 3
Unsupported compiler option: 4890
Unsupported source language: 31
Digging into that the main culprit is that linux kernel build isn't cached anymore, ccache wrongly detected "-Wa,-march=armv8.5-a" as being "-Wa,-a=..." and refused to cache it (I'll report that upstream):
Assembler listing file (-Wa,...=file) is currently not supported: -Wa,-march=armv8.5-a
...
Failed; falling back to running the real compiler
...
Result: unsupported_compiler_option
That accounts for about 4.5k of the "Unsupported compiler options".
Reverting the incorrect -Wa,-a check commit https://github.com/ccache/ccache/commit/c532b79d60c8f22e4a66d8f93b927f79e88b94dd bought the stats into the ballpark of 3.7 builds but rebuild time still was significantly longer. Now without CCACHE_DEBUG:
$ time ../make-rpi4.sh
...
real 30m39.441s
user 151m24.418s
sys 21m4.793s
ccache stats:
Summary:
Cache directory: /home/hias/ccache-master
Primary config: /home/hias/ccache-master/ccache.conf
Secondary config: /home/hias/private/libreelec/libreelec-master/build.LibreELEC-RPi4.arm-11.0-devel/toolchain/etc/ccache.conf
Stats updated: Sun Sep 18 21:17:11 2022
Hits: 62581 / 62999 (99.34 %)
Direct: 60787 / 66724 (91.10 %)
Preprocessed: 1794 / 4720 (38.01 %)
Misses: 418
Direct: 5937
Preprocessed: 2926
Errors: 19
Uncacheable: 25647
Primary storage:
Hits: 124081 / 132211 (93.85 %)
Misses: 8130
Cache size (GB): 5.86 / 40.00 (14.65 %)
Files: 115471
Errors:
Internal error: 19
Uncacheable:
Autoconf compile/link: 9822
Bad compiler arguments: 719
Called for linking: 5435
Called for preprocessing: 1393
Compilation failed: 2380
Compiler produced empty output: 106
Compiler produced no output: 3
Could not use precompiled header: 3
Multiple source files: 3
No input file: 4304
Output to stdout: 7
Preprocessing failed: 1338
Unsupported code directive: 4
Unsupported compiler option: 99
Unsupported source language: 31
Digging a bit further it seems that it takes almost 9 minutes to just build ccache with deps at THREADCOUNT=1:
$ export THREADCOUNT=1
$ time ../run-rpi4.sh scripts/build ccache:host
...
real 8m55.466s
user 42m36.606s
sys 2m43.052s
I'll try to dig a bit further into it but my hunch is that the ccache (+deps) build time is probably responsible for the majority of the time increase on top of the non-cached linux build issue.
I'm adding merge blocked for now, especially the ccache bug needs to be fixed/resolved/reverted before this can go in.
BTW: I tested on my E6430 laptop with i7-3740QM, Samsung EVO860 SDD with LUKS+LVM and I also build with CCACHE_NOCOMPRESS=1 to get comparable results to 3.7
I did some more tests, again with CCACHE_NOCOMPRESS=1. First a complete clean build (ccache nuked) and then 2 clean rebuilds (with ccache intact), both with master and this PR plus the -Wa revert.
Then I did another test, using this PR but switched ccache back to 3.7, used the package.mk from master but kept PKG_DEPENDS_HOST as in this PR to measure the impact of the added dependencies. I didn't do a full clean/rebuild/rebuild test here but simply did 2 clean rebuilds using the pre-populated 3.7 cache.
Here are the "time" outputs of make from full-clean and second rebuild
ccache 3.7.12:
initial:
real 121m32.320s
user 822m35.934s
sys 78m44.177s
rebuild2:
real 22m28.057s
user 107m52.215s
sys 17m45.735s
ccache 4.6.3
initial:
real 125m50.153s
user 840m37.276s
sys 84m1:1.576s
rebuild2:
real 30m9.614s
user 149m38.462s
sys 20m31.692s
this PR/deps with ccache 3.7:
rebuild4-with-4.6-deps:
real 29m31.610s
user 145m22.913s
sys 20m1.017s
So the initial build took about 4.5minutes (~4%) longer with 4.6. Second rebuild took about 7.5 minutes (/33%) longer compared to 3.7.
The 3.7-with-4.6-deps rebuild was only 30 seconds faster than the 4.6 rebuild result - which indicates that libzstd and especially cmale are the hefty dependencies that kill rebuild performance.
I'm somewhat inclined to simply make cmake a build host dependency, so zstd:host and ccache:host can be built right at the beginning, using cmake from build host so we have ccache available for cmake, gettext, pkg-config etc host builds.
I haven't tested yet but I guess that might improve rebuild performance a lot.
I did another test analyzing the critical chain at image build start.
dependency plan up to ccache looks like this:
build gettext:host (wants: )
build pkg-config:host (wants: gettext:host)
build cmake:host (wants: pkg-config:host)
build make:host (wants: )
build zstd:host (wants: cmake:host, make:host)
build ccache:host (wants: cmake:host, make:host, zstd:host)
so the critical chain is gettext, make->pkgconfig->cmake->zstd->ccache
Here are results from separately building each package:
time ../run-rpi4.sh scripts/build make:host
real 0m8.829s
user 0m12.833s
sys 0m2.314s
time ../run-rpi4.sh scripts/build gettext:host
real 2m23.918s
user 3m27.409s
sys 0m30.713s
time ../run-rpi4.sh scripts/build pkg-config:host
real 0m17.252s
user 0m38.762s
sys 0m5.393s
time ../run-rpi4.sh scripts/build cmake:host
real 4m54.428s
user 32m41.799s
sys 1m40.192s
time ../run-rpi4.sh scripts/build zstd:host
real 0m10.973s
user 1m0.238s
sys 0m1.864s
time ../run-rpi4.sh scripts/build ccache:host
real 0m40.429s
user 3m20.995s
sys 0m9.115s
cmake is by far the largest contributor, gettext needs quite a lot of time, too (mainly in configure stage). zstd build is quick (but needs cmake) and ccache is rather quick to build, too
I have this some time on hold and it is perfect time to get dust off. When package is bumped ccache is not used anymore because the absolute path is changed. So I was thinking about solution like this below. It sets CCACHE_BASEDIR environment variable to the package build folder which means that ccache is used even on package bumps. What do you think? On quick reading I don't see any drawbacks.
https://ccache.dev/manual/4.3.html#config_base_dir https://ccache.dev/manual/4.3.html#_compiling_in_different_directories
diff --git a/scripts/build b/scripts/build index edc47ff20d..3c7b1272a1 100755 --- a/scripts/build +++ b/scripts/build @@ -221,6 +221,9 @@ mkdir -p "${PKG_BUILD}" cd "${PKG_BUILD}" +# share ccache between package bumps +export CCACHE_BASEDIR=${PKG_BUILD} + if [ -f "${PKG_CONFIGURE_SCRIPT}" -o -f "${PKG_CMAKE_SCRIPT}" -o -f "${PKG_MESON_SCRIPT}" ]; then case "${TARGET}" in "target") PKG_REAL_BUILD="${PKG_BUILD}/.${TARGET_NAME}" ;;
I’ve added this to my dev tree. I haven’t done any formal testing - but haven’t seen any badness yet. I’ll put something (test harness) together to start capturing the performance data and testing the improvements using this for the bumps.
The CCACHE_BASEDIR change looks quite dangerous to me.
I use an external CCACHE_DIR, shared by multiple projects/devices of an LE branch and if I understood the docs this might lead to eg AW build picking up a cached object of an RPi build.
If we want such functionality it's best to make it optional (this one probably would require a pre-build hook where user options are sourced).
In general I'd like to be LE default configuration to be as much stock and conservative as possible - aggressive optimizations should only be applied by users/devs at their own risk :-)
The CCACHE_BASEDIR change looks quite dangerous to me.
I use an external CCACHE_DIR, shared by multiple projects/devices of an LE branch and if I understood the docs this might lead to eg AW build picking up a cached object of an RPi build.
If we want such functionality it's best to make it optional (this one probably would require a pre-build hook where user options are sourced).
In general I'd like to be LE default configuration to be as much stock and conservative as possible - aggressive optimizations should only be applied by users/devs at their own risk :-)
I’m interested in getting a better understanding on how the “magic” work - especially your shared ccache model across the different target and exactly the ccache logic handles the change of the compiler. Would like to see that the performance difference is there. Also if there is an option to use the full path BASEDIR (but without the -version) so that ABC/main.c is not DEF/main.c
Another option to think of is CCACHE_NOHASHDIR="1"
. Openembedded has it set:
https://github.com/openembedded/openembedded-core/blob/master/meta/classes/ccache.bbclass#L50
The point of this proposed changes is what I already wrote: when you bump package the ccache for it is not used anymore because absolute path has been changed. Even if new package sources has only 1 file changed and everything else is the same everything is build without ccache. This looks wrong to me.
But I'm not saying anything like this needs to be implemented. But if it works why not. We need to be conservative with resources this days :)
There's no magic here, I simply set CCACHE_DIR=~/ccache-master
, CCACHE_DIR=~/ccache-le10
or something like that.
That's basically the default ccache config (where it's using ~/.ccache or ~/.cache/ccache) - ccache will associate cache entries with the dir so eg ~/libreelec-10 and ~/libreelec-master won't re-use each other's cache netries - they only share a single cache.
Cache hit rate when updating linux, using stock ccache config (so no basedir etc) was actually quite good.
RPi4 kernel update from 5.15.67-3157603 to 5.15.68-ac7d1fd needed a full rebuild, probably because a new config options was added (CONFIG_ARM64_ERRATUM_2457168) - in that case it'd expect (almost) all of the source to be rebuilt though.
Then updating from 5.15.68-ac7d1fd to 5.15.68-b11af53 (which changed vc4 driver and header) was rather quick. "make" took about 3 minutes and cache hit rate was about 90%
real 2m56.479s
user 16m10.769s
sys 2m7.951s
Summary:
Cache directory: /home/hias/ccache-master
Primary config: /home/hias/ccache-master/ccache.conf
Secondary config: /home/hias/private/libreelec/libreelec-master/build.LibreELEC-RPi4.arm-11.0-devel/toolchain/etc/ccache.conf
Stats updated: Tue Sep 20 18:06:54 2022
Hits: 4768 / 5291 (90.12 %)
Direct: 4760 / 5334 (89.24 %)
Preprocessed: 8 / 535 (1.50 %)
Misses: 523
Direct: 574
Preprocessed: 527
Uncacheable: 563
Primary storage:
Hits: 9533 / 10628 (89.70 %)
Misses: 1095
Cache size (GB): 6.59 / 40.00 (16.46 %)
Files: 126196
Uncacheable:
Called for linking: 85
Called for preprocessing: 378
Compilation failed: 2
Compiler produced empty output: 2
No input file: 57
Preprocessing failed: 35
Unsupported code directive: 4
Then cleaning image and running make again took about 1min 20 (about 30 seconds from that are squashfs though) and cache hit rate was almost 100%
Total Build Time: 00:00:51.675 (wall clock)
real 1m17.366s
user 5m2.212s
sys 1m8.107s
Summary:
Cache directory: /home/hias/ccache-master
Primary config: /home/hias/ccache-master/ccache.conf
Secondary config: /home/hias/private/libreelec/libreelec-master/build.LibreELEC-RPi4.arm-11.0-devel/toolchain/etc/ccache.conf
Stats updated: Tue Sep 20 18:13:27 2022
Hits: 5019 / 5022 (99.94 %)
Direct: 5012 / 5057 (99.11 %)
Preprocessed: 7 / 14 (50.00 %)
Misses: 3
Direct: 45
Preprocessed: 7
Uncacheable: 546
Primary storage:
Hits: 10034 / 10083 (99.51 %)
Misses: 49
Cache size (GB): 6.59 / 40.00 (16.46 %)
Files: 126201
Uncacheable:
Called for linking: 84
Called for preprocessing: 378
Compilation failed: 2
Compiler produced empty output: 2
No input file: 49
Preprocessing failed: 27
Unsupported code directive: 4
Then looks like something is wrong how I tested.
Updated to ccache-4.7, erroring with:
- https://github.com/ccache/ccache/issues/1189
Build can be completed by running:
-
make image
-
CCACHE_DISABLE=1 scripts/build nettle
-
make image
Closing as this is superseded by
- #7202