LibreELEC.tv icon indicating copy to clipboard operation
LibreELEC.tv copied to clipboard

ccache: update to 4.6.3 and now builds with cmake

Open heitbaum opened this issue 2 years ago • 20 comments

ccache: update to 4.6.3 and now builds with cmake

Release notes:

  • https://ccache.dev/releasenotes.html#_ccache_4_0

ccache now uses zstd. This PR is complete - but in test and needs testing across a variety of situations.

DONE:

  • review / change zstd:host build to use cmake or make, as the dependencies for meson and ninja have not been built at this stage of the make image
  • ~The PR currently depends on sudo apt install -y libzstd-dev on the build_host.~
    • zstd:host package now builds without ccache.

Performance

Disk utilisation and performance testing as below. make image

ccache - 4.6.3

First build 3.9GB
Second build 5.0GB
Third build 6.1GB

Total Build Time: 02:09:15.821 (wall clock)
Accum Build Time: 17:14:06.469 (8 slots)

Total Build Time: 00:24:31.422 (wall clock)
Accum Build Time: 03:16:11.276 (8 slots)

Total Build Time: 00:24:17.381 (wall clock)
Accum Build Time: 03:14:18.950 (8 slots)

ccache - 3.7.12

First build 7.66GB
Second build 9.10GB
Third build 8.65GB
Forth build 8.55GB

Total Build Time: 02:01:01.230 (wall clock)
Accum Build Time: 16:08:09.743 (8 slots)

Total Build Time: 00:22:33.545 (wall clock)
Accum Build Time: 03:00:28.263 (8 slots)

Total Build Time: 00:24:21.433 (wall clock)
Accum Build Time: 03:14:51.361 (8 slots)

Total Build Time: 00:23:51.413 (wall clock)
Accum Build Time: 03:10:51.208 (8 slots)

Other updates

With the build dependency on cmake an update has been made to the cmake dependencies (that being: gettext and pkg-config to build using the LOCAL_CC compiler and without the dependency on ccache:host.

  • zlib build needed the addition of ccache:host with the reordering of dependencies
  • elfutils build needed the addition of m4:host with the reordering of dependencies

Package changes:

  • elfutils: host build depends on m4:host
  • pkg-config: remove host dependancy of ccache:host
  • gettext: remove host dependancy of ccache:host
  • cmake: remove host dependancy of ccache:host
  • zlib: host build depends on ccache:host
  • ccache: update to 4.6.3 and now builds with cmake
    • ccache: add zstd:host dependency
  • zstd: build with cmake-make and remove host dependancy of ccache:host

Using -3 the stats are:

Total Build Time: 02:14:46.500 (wall clock)
Accum Build Time: 17:58:11.889 (8 slots)

Total Build Time: 00:25:53.479 (wall clock)
Accum Build Time: 03:27:07.730 (8 slots)

4.24GB
5.36GB

bin/ccache -d build.LibreELEC-Generic.x86_64-11.0-devel/.ccache -s
Summary:
  Hits:             73580 / 139824 (52.62 %)
    Direct:         67132 / 148969 (45.06 %)
    Preprocessed:    6448 /  78074 (8.26 %)
  Misses:           66244
    Direct:         81837
    Preprocessed:   71626
  Errors:              82
  Uncacheable:      54451
Primary storage:
  Hits:            142915 / 294149 (48.59 %)
  Misses:          151234
  Cache size (GB):   5.48 /  10.00 (54.79 %)

Use the -v/--verbose option for more details.

$ bin/ccache -d build.LibreELEC-Generic.x86_64-11.0-devel/.ccache -p | grep build
(environment) cache_dir = build.LibreELEC-Generic.x86_64-11.0-devel/.ccache
(build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/ccache.conf) compression_level = -3
(build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/ccache.conf) max_size = 10.0G
(default) temporary_dir = build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/tmp

Using -5 the stats are:

Total Build Time: 02:12:08.136 (wall clock)
Accum Build Time: 17:37:04.987 (8 slots)

Total Build Time: 00:25:04.302 (wall clock)
Accum Build Time: 03:20:34.285 (8 slots)

4.48GB
5.62GB

$ bin/ccache -d build.LibreELEC-Generic.x86_64-11.0-devel/.ccache -s
Summary:
  Hits:             73542 / 139824 (52.60 %)
    Direct:         67008 / 148969 (44.98 %)
    Preprocessed:    6534 /  78198 (8.36 %)
  Misses:           66282
    Direct:         81961
    Preprocessed:   71664
  Errors:              82
  Uncacheable:      54451
Primary storage:
  Hits:            [142710 / 294149](tel:142710 / 294149) (48.52 %)
  Misses:          151439
  Cache size (GB):   5.74 /  10.00 (57.44 %)

Use the -v/--verbose option for more details.

$ bin/ccache -d build.LibreELEC-Generic.x86_64-11.0-devel/.ccache -p | grep build
(environment) cache_dir = build.LibreELEC-Generic.x86_64-11.0-devel/.ccache
(build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/ccache.conf) compression_level = -5
(build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/ccache.conf) max_size = 10.0G
(default) temporary_dir = build.LibreELEC-Generic.x86_64-11.0-devel/.ccache/tmp

heitbaum avatar Sep 12 '22 12:09 heitbaum

I think we should try to lower the compression a bit (less cpu heavy), I guess this explains the rather big difference at the build times. Maybe to "-3" or "-5". https://ccache.dev/manual/4.6.3.html#config_compression_level

CvH avatar Sep 13 '22 10:09 CvH

I think we should try to lower the compression a bit (less cpu heavy), I guess this explains the rather big difference at the build times. Maybe to "-3" or "-5". https://ccache.dev/manual/4.6.3.html#config_compression_level

Will test. As a point of reading the performance data. The first row in each is with no ccache.

so a full build (no ccache is 2h1m versus 2h9m) wall clock.

2/ I have just done the changes to zstd so no need for libzstd-dev now.

3/ I have tested the other package updates (without the ccache update and they are fine as-is) so a revert of ccache 4 to 3 and back to 4 can be done in isolation.

4/ I’ll kick off the -3 and -5 tests.

heitbaum avatar Sep 13 '22 11:09 heitbaum

Tests don’t show any significant performance improvement using zstd -3, -5 versus 1. So I have defaulted CCACHE_COMPRESSLEVEL="0" —- which defaults zstd to 1.

heitbaum avatar Sep 14 '22 12:09 heitbaum

Results from building without compression were 9.0GB and

Total Build Time: 02:13:14.015 (wall clock)
Accum Build Time: 17:45:52.007 (8 slots)

so the differences between 3.7.12 and 4.6.3 do not look to be compression related.

interesting stats by recompressing the .ccache (only took 30 seconds from 9GB to 3.7GB) - these size statistics match the previous test results.

$ time bin/ccache -d build.LibreELEC-Generic.x86_64-11.0-devel/.ccache -X 1
Recompressing... 100.0% [=======================================================================================]
Original data:           9.0 GB
Old compressed data:     9.0 GB (100.0% of original size)
  - Compression ratio: 1.000 x  (0.0% space savings)
New compressed data:     3.7 GB (41.5% of original size)
  - Compression ratio: 2.410 x  (58.5% space savings)
Size change:            -5.3 GB

real    0m30.584s
user    0m19.726s
sys     0m39.353s

heitbaum avatar Sep 14 '22 16:09 heitbaum

Retested on an idle nuc11 Intel Tiger Lake x86_64 (nuc11 i7-1165G7) - 2 full run though - with rm -rf build* in between.

CCACHE_DISABLE=1 make image > testh.1.out 2>&1
  Total Build Time: 01:56:09.895 (wall clock)  
  Accum Build Time: 15:29:19.060 (8 slots)

  Total Build Time: 01:54:26.785 (wall clock)
  Accum Build Time: 15:15:34.181 (8 slots)

ccache 4.6.3
make image > testh.2.out 2>&1
  Total Build Time: 02:09:33.691 (wall clock)
  Accum Build Time: 17:16:29.427 (8 slots)

  Total Build Time: 02:10:00.651 (wall clock)
  Accum Build Time: 17:20:05.104 (8 slots)

ccache 3.7.12
git revert 36f00e572a # this reverts for 4.6.3 package to 3.7.12 (but all other packages stayed the same)
make image > testh.3.out 2>&1
  Total Build Time: 02:05:52.168 (wall clock)
  Accum Build Time: 16:46:57.241 (8 slots)

  Total Build Time: 02:05:58.246 (wall clock)
  Accum Build Time: 16:47:45.865 (8 slots)

no ccache

Breakdown by status (all slots):

  Status   Usage         ( Pct )  Count  State
  ACTIVE   06:50:23.173  (44.2%)  796    busy
  FAILED     :  :00.000  (00.0%)  0
  GETPKG     :  :00.000  (00.0%)  0      busy
  IDLE     08:34:17.287  (55.3%)  296
  LOCKED     :02:58.865  (00.3%)  902
  MUTEX      :  :11.447  (00.0%)  7      busy
  MUTEX/W    :  :07.068  (00.0%)  3      stall
  STALLED    :  :10.569  (00.0%)  3      stall
  UNLOCK     :01:10.652  (00.1%)  902
  -------------------------------------
  TOTAL    15:29:19.060  ( 100%)  2909

Peak concurrency: 8 out of 8 slots

6 job slots were held in a "stall" state for 00:00:17.637

Slot usage (time in a "busy" state):     | Concurrency breakdown ("busy"):
                                         |
#Rank  Slot  Usage        ( Pct )        | # of Slots  Usage        ( Pct )
 #01    05   01:29:44.493 (09.7%)        |     01      01:56:06.144 (12.5%)
 #02    03   01:00:41.961 (06.5%)        |     02      01:26:38.384 (09.3%)
 #03    04     :59:01.686 (06.4%)        |     03        :52:30.535 (05.7%)
 #04    08     :46:42.757 (05.0%)        |     04        :38:04.984 (04.1%)
 #05    06     :43:21.396 (04.7%)        |     05        :35:45.280 (03.8%)
 #06    01     :40:33.473 (04.4%)        |     06        :29:50.615 (03.2%)
 #07    02     :38:42.676 (04.2%)        |     07        :28:58.415 (03.1%)
 #08    07     :31:46.177 (03.4%)        |     08        :22:40.264 (02.4%)
-----------------------------------------+---------------------------------
 TOTALS      06:50:34.620 (44.2%)                      06:50:34.620 (44.2%)

Breakdown by status (all slots):

  Status   Usage         ( Pct )  Count  State
  ACTIVE   06:35:19.420  (43.2%)  796    busy
  FAILED     :  :00.000  (00.0%)  0
  GETPKG     :  :00.000  (00.0%)  0      busy
  IDLE     08:36:25.468  (56.4%)  296
  LOCKED     :02:09.894  (00.2%)  902
  MUTEX      :  :09.186  (00.0%)  7      busy
  MUTEX/W    :  :06.030  (00.0%)  3      stall
  STALLED    :  :27.863  (00.1%)  3      stall
  UNLOCK     :  :56.321  (00.1%)  902
  -------------------------------------
  TOTAL    15:15:34.181  ( 100%)  2909

Peak concurrency: 8 out of 8 slots

6 job slots were held in a "stall" state for 00:00:33.892

Slot usage (time in a "busy" state):     | Concurrency breakdown ("busy"):
                                         |
#Rank  Slot  Usage        ( Pct )        | # of Slots  Usage        ( Pct )
 #01    04   01:06:33.104 (07.3%)        |     01      01:54:22.764 (12.5%)
 #02    02     :59:42.543 (06.5%)        |     02      01:25:06.992 (09.3%)
 #03    01     :54:36.542 (06.0%)        |     03        :50:43.530 (05.5%)
 #04    07     :49:48.823 (05.4%)        |     04        :36:15.407 (04.0%)
 #05    08     :47:22.254 (05.2%)        |     05        :34:08.238 (03.7%)
 #06    03     :41:27.223 (04.5%)        |     06        :27:23.990 (03.0%)
 #07    06     :38:35.375 (04.2%)        |     07        :26:25.938 (02.9%)
 #08    05     :37:22.742 (04.1%)        |     08        :21:01.748 (02.3%)
-----------------------------------------+---------------------------------
 TOTALS      06:35:28.606 (43.2%)                      06:35:28.606 (43.2%)

ccache 4.6.3

Breakdown by status (all slots):

  Status   Usage         ( Pct )  Count  State
  ACTIVE   08:12:32.232  (47.5%)  796    busy
  FAILED     :  :00.000  (00.0%)  0
  GETPKG     :  :00.000  (00.0%)  0      busy
  IDLE     08:59:43.013  (52.1%)  296
  LOCKED     :02:49.028  (00.3%)  902
  MUTEX      :  :09.900  (00.0%)  7      busy
  MUTEX/W    :  :07.454  (00.0%)  4      stall
  STALLED    :  :01.303  (00.0%)  2      stall
  UNLOCK     :01:06.496  (00.1%)  902
  -------------------------------------
  TOTAL    17:16:29.427  ( 100%)  2909

Peak concurrency: 8 out of 8 slots

6 job slots were held in a "stall" state for 00:00:08.757

Slot usage (time in a "busy" state):     | Concurrency breakdown ("busy"):
                                         |
#Rank  Slot  Usage        ( Pct )        | # of Slots  Usage        ( Pct )
 #01    01   01:19:38.095 (07.7%)        |     01      02:09:30.073 (12.5%)
 #02    08   01:11:28.086 (06.9%)        |     02      01:35:38.925 (09.2%)
 #03    03   01:10:33.295 (06.8%)        |     03      01:09:18.299 (06.7%)
 #04    04   01:05:23.589 (06.3%)        |     04        :47:44.777 (04.6%)
 #05    07     :54:42.070 (05.3%)        |     05        :46:05.190 (04.4%)
 #06    02     :52:31.970 (05.1%)        |     06        :37:59.698 (03.7%)
 #07    06     :51:50.128 (05.0%)        |     07        :36:31.404 (03.5%)
 #08    05     :46:34.899 (04.5%)        |     08        :29:53.767 (02.9%)
-----------------------------------------+---------------------------------
 TOTALS      08:12:42.132 (47.5%)                      08:12:42.132 (47.5%)

Breakdown by status (all slots):

  Status   Usage         ( Pct )  Count  State
  ACTIVE   08:02:58.632  (46.4%)  796    busy
  FAILED     :  :00.000  (00.0%)  0
  GETPKG     :  :00.000  (00.0%)  0      busy
  IDLE     09:12:04.915  (53.1%)  296
  LOCKED     :03:10.034  (00.3%)  902
  MUTEX      :  :10.772  (00.0%)  7      busy
  MUTEX/W    :  :06.128  (00.0%)  4      stall
  STALLED    :  :23.802  (00.0%)  3      stall
  UNLOCK     :01:10.820  (00.1%)  902
  -------------------------------------
  TOTAL    17:20:05.104  ( 100%)  2910

Peak concurrency: 8 out of 8 slots

7 job slots were held in a "stall" state for 00:00:29.931

Slot usage (time in a "busy" state):     | Concurrency breakdown ("busy"):
                                         |
#Rank  Slot  Usage        ( Pct )        | # of Slots  Usage        ( Pct )
 #01    08   01:16:55.041 (07.4%)        |     01      02:09:57.205 (12.5%)
 #02    04   01:14:24.611 (07.2%)        |     02      01:37:33.846 (09.4%)
 #03    05   01:13:51.295 (07.1%)        |     03      01:05:53.251 (06.3%)
 #04    01   01:06:07.534 (06.4%)        |     04        :45:08.217 (04.3%)
 #05    06     :52:48.358 (05.1%)        |     05        :43:38.575 (04.2%)
 #06    03     :51:55.788 (05.0%)        |     06        :36:30.984 (03.5%)
 #07    07     :49:48.045 (04.8%)        |     07        :35:32.742 (03.4%)
 #08    02     :37:18.732 (03.6%)        |     08        :28:54.585 (02.8%)
-----------------------------------------+---------------------------------
 TOTALS      08:03:09.404 (46.5%)                      08:03:09.404 (46.5%)

ccache 3.7.12

Breakdown by status (all slots):

  Status   Usage         ( Pct )  Count  State
  ACTIVE   07:48:04.779  (46.5%)  796    busy
  FAILED     :  :00.000  (00.0%)  0
  GETPKG     :  :00.000  (00.0%)  0      busy
  IDLE     08:55:06.520  (53.1%)  296
  LOCKED     :02:28.376  (00.2%)  902
  MUTEX      :  :11.410  (00.0%)  7      busy
  MUTEX/W    :  :05.176  (00.0%)  4      stall
  STALLED    :  :01.108  (00.0%)  3      stall
  UNLOCK     :  :59.872  (00.1%)  902
  -------------------------------------
  TOTAL    16:46:57.241  ( 100%)  2910

Peak concurrency: 8 out of 8 slots

7 job slots were held in a "stall" state for 00:00:06.284

Slot usage (time in a "busy" state):     | Concurrency breakdown ("busy"):
                                         |
#Rank  Slot  Usage        ( Pct )        | # of Slots  Usage        ( Pct )
 #01    07   01:22:08.980 (08.2%)        |     01      02:05:48.348 (12.5%)
 #02    08   01:13:51.542 (07.3%)        |     02      01:33:17.611 (09.3%)
 #03    03   01:13:14.792 (07.3%)        |     03      01:05:24.108 (06.5%)
 #04    01   01:08:57.305 (06.8%)        |     04        :43:41.665 (04.3%)
 #05    05     :50:10.910 (05.0%)        |     05        :42:17.022 (04.2%)
 #06    04     :45:42.455 (04.5%)        |     06        :35:21.562 (03.5%)
 #07    06     :39:23.660 (03.9%)        |     07        :34:27.249 (03.4%)
 #08    02     :34:46.545 (03.5%)        |     08        :27:58.624 (02.8%)
-----------------------------------------+---------------------------------
 TOTALS      07:48:16.189 (46.5%)                      07:48:16.189 (46.5%)

Breakdown by status (all slots):

  Status   Usage         ( Pct )  Count  State
  ACTIVE   07:46:04.403  (46.2%)  796    busy
  FAILED     :  :00.000  (00.0%)  0
  GETPKG     :  :00.000  (00.0%)  0      busy
  IDLE     08:57:24.578  (53.3%)  296
  LOCKED     :02:48.172  (00.3%)  902
  MUTEX      :  :06.622  (00.0%)  7      busy
  MUTEX/W    :  :04.593  (00.0%)  4      stall
  STALLED    :  :15.906  (00.0%)  3      stall
  UNLOCK     :01:01.591  (00.1%)  902
  -------------------------------------
  TOTAL    16:47:45.865  ( 100%)  2910

Peak concurrency: 8 out of 8 slots

7 job slots were held in a "stall" state for 00:00:20.499

Slot usage (time in a "busy" state):     | Concurrency breakdown ("busy"):
                                         |
#Rank  Slot  Usage        ( Pct )        | # of Slots  Usage        ( Pct )
 #01    05   01:38:42.620 (09.8%)        |     01      02:05:54.497 (12.5%)
 #02    07   01:18:16.659 (07.8%)        |     02      01:37:18.775 (09.7%)
 #03    01   01:05:36.977 (06.5%)        |     03      01:04:39.683 (06.4%)
 #04    03     :52:31.162 (05.2%)        |     04        :42:01.857 (04.2%)
 #05    08     :50:37.515 (05.0%)        |     05        :40:56.579 (04.1%)
 #06    06     :47:40.107 (04.7%)        |     06        :34:19.575 (03.4%)
 #07    04     :38:41.654 (03.8%)        |     07        :33:23.191 (03.3%)
 #08    02     :34:04.331 (03.4%)        |     08        :27:36.869 (02.7%)
-----------------------------------------+---------------------------------
 TOTALS      07:46:11.025 (46.3%)                      07:46:11.025 (46.3%)

heitbaum avatar Sep 16 '22 11:09 heitbaum

Summary of performance results:

build times are 8% longer between no ccache --> ccache 3.7.12 build times are 12% longer between no ccache --> ccache 4.6.3 build times are 4% longer between ccache 3.7.12 --> ccache 4.6.3

image

heitbaum avatar Sep 16 '22 11:09 heitbaum

Added support discussion in ccache:

  • https://github.com/ccache/ccache/discussions/1160

heitbaum avatar Sep 16 '22 13:09 heitbaum

Different results on an ADL

Intel(R) Client Systems NUC12WSKi7/NUC12WSBi7
BIOS WSADL357.0085.2022.0718.1739 07/18/2022
CPU0: 12th Gen Intel(R) Core(TM) i7-1260P

ccache 4.6.3

Total Build Time: 01:18:41.884 (wall clock)
Accum Build Time: 20:36:42.191 (16 slots)

Total Build Time: 00:18:21.047 (wall clock)
Accum Build Time: 04:31:04.287 (16 slots)

ccache 3.7.12

Total Build Time: 01:17:41.557 (wall clock)
Accum Build Time: 20:21:23.283 (16 slots)

Total Build Time: 00:19:22.917 (wall clock)
Accum Build Time: 04:45:24.514 (16 slots)

heitbaum avatar Sep 17 '22 10:09 heitbaum

I have this some time on hold and it is perfect time to get dust off. When package is bumped ccache is not used anymore because the absolute path is changed. So I was thinking about solution like this below. It sets CCACHE_BASEDIR environment variable to the package build folder which means that ccache is used even on package bumps. What do you think? On quick reading I don't see any drawbacks.

https://ccache.dev/manual/4.3.html#config_base_dir https://ccache.dev/manual/4.3.html#_compiling_in_different_directories


diff --git a/scripts/build b/scripts/build
index edc47ff20d..3c7b1272a1 100755
--- a/scripts/build
+++ b/scripts/build
@@ -221,6 +221,9 @@ mkdir -p "${PKG_BUILD}"
 
 cd "${PKG_BUILD}"
 
+# share ccache between package bumps
+export CCACHE_BASEDIR=${PKG_BUILD}
+
 if [ -f "${PKG_CONFIGURE_SCRIPT}" -o -f "${PKG_CMAKE_SCRIPT}" -o -f "${PKG_MESON_SCRIPT}" ]; then
   case "${TARGET}" in
     "target")    PKG_REAL_BUILD="${PKG_BUILD}/.${TARGET_NAME}" ;;

vpeter4 avatar Sep 17 '22 13:09 vpeter4

Build on ADL - Single THREADCOUNT - I have not yet analysed the logs

ccache 3.7.12 + THREADCOUNT=1 make image

Total Build Time: 01:45:40.668 (wall clock) Accum Build Time: 01:45:40.668 (1 slots)

ccache 4.6.3 + THREADCOUNT=1 make image

Total Build Time: 01:47:25.674 (wall clock) Accum Build Time: 01:47:25.674 (1 slots)

the history and job logs from these runs are at:

  • https://github.com/LibreELEC/LibreELEC.tv/commit/2c6b0a40d1507f584b1a5a37f0b2d97bfa93a5d7

heitbaum avatar Sep 18 '22 02:09 heitbaum

Testing RPi4 builds I noticed that clean rebuilds (with ccache intact) are now significantly slower, rebuild time went up from about 22-25 minutes to about 40-45 minutes (ballpark numbers, sometimes during test I had CCACHE_DEBUG enabled).

ccache stats also reported a lot more uncacheable files and a lot fewer cache hits- here are the stats from the second rebuild

Summary:
  Cache directory:                  /home/hias/ccache-master
  Primary config:                   /home/hias/ccache-master/ccache.conf
  Secondary config:                 /home/hias/private/libreelec/libreelec-master/build.LibreELEC-RPi4.arm-11.0-devel/toolchain/etc/ccache.conf
  Stats updated:                    Sun Sep 18 18:00:10 2022
  Hits:                              58018 /  58220 (99.65 %)
    Direct:                          56123 /  61933 (90.62 %)
    Preprocessed:                     1895 /   4604 (41.16 %)
  Misses:                              202
    Direct:                           5810
    Preprocessed:                     2709
  Errors:                               19
  Uncacheable:                       30426
Primary storage:
  Hits:                             114899 / 122641 (93.69 %)
  Misses:                             7742
  Cache size (GB):                    5.25 /  40.00 (13.12 %)
  Files:                            104992
Errors:
  Internal error:                       19
Uncacheable:
  Autoconf compile/link:              9822
  Bad compiler arguments:              719
  Called for linking:                 5435
  Called for preprocessing:           1392
  Compilation failed:                 2379
  Compiler produced empty output:      106
  Compiler produced no output:           3
  Could not use precompiled header:      3
  Multiple source files:                 3
  No input file:                      4305
  Output to stdout:                      7
  Preprocessing failed:               1328
  Unsupported code directive:            3
  Unsupported compiler option:        4890
  Unsupported source language:          31

Digging into that the main culprit is that linux kernel build isn't cached anymore, ccache wrongly detected "-Wa,-march=armv8.5-a" as being "-Wa,-a=..." and refused to cache it (I'll report that upstream):

Assembler listing file (-Wa,...=file) is currently not supported: -Wa,-march=armv8.5-a
...
Failed; falling back to running the real compiler
...
Result: unsupported_compiler_option

That accounts for about 4.5k of the "Unsupported compiler options".

Reverting the incorrect -Wa,-a check commit https://github.com/ccache/ccache/commit/c532b79d60c8f22e4a66d8f93b927f79e88b94dd bought the stats into the ballpark of 3.7 builds but rebuild time still was significantly longer. Now without CCACHE_DEBUG:

$ time ../make-rpi4.sh
...
real    30m39.441s
user    151m24.418s
sys     21m4.793s

ccache stats:

Summary:
  Cache directory:                  /home/hias/ccache-master
  Primary config:                   /home/hias/ccache-master/ccache.conf
  Secondary config:                 /home/hias/private/libreelec/libreelec-master/build.LibreELEC-RPi4.arm-11.0-devel/toolchain/etc/ccache.conf
  Stats updated:                    Sun Sep 18 21:17:11 2022
  Hits:                              62581 /  62999 (99.34 %)
    Direct:                          60787 /  66724 (91.10 %)
    Preprocessed:                     1794 /   4720 (38.01 %)
  Misses:                              418
    Direct:                           5937
    Preprocessed:                     2926
  Errors:                               19
  Uncacheable:                       25647
Primary storage:
  Hits:                             124081 / 132211 (93.85 %)
  Misses:                             8130
  Cache size (GB):                    5.86 /  40.00 (14.65 %)
  Files:                            115471
Errors:
  Internal error:                       19
Uncacheable:
  Autoconf compile/link:              9822
  Bad compiler arguments:              719
  Called for linking:                 5435
  Called for preprocessing:           1393
  Compilation failed:                 2380
  Compiler produced empty output:      106
  Compiler produced no output:           3
  Could not use precompiled header:      3
  Multiple source files:                 3
  No input file:                      4304
  Output to stdout:                      7
  Preprocessing failed:               1338
  Unsupported code directive:            4
  Unsupported compiler option:          99
  Unsupported source language:          31

Digging a bit further it seems that it takes almost 9 minutes to just build ccache with deps at THREADCOUNT=1:

$ export THREADCOUNT=1
$ time ../run-rpi4.sh scripts/build ccache:host
...
real    8m55.466s
user    42m36.606s
sys     2m43.052s

I'll try to dig a bit further into it but my hunch is that the ccache (+deps) build time is probably responsible for the majority of the time increase on top of the non-cached linux build issue.

I'm adding merge blocked for now, especially the ccache bug needs to be fixed/resolved/reverted before this can go in.

BTW: I tested on my E6430 laptop with i7-3740QM, Samsung EVO860 SDD with LUKS+LVM and I also build with CCACHE_NOCOMPRESS=1 to get comparable results to 3.7

HiassofT avatar Sep 18 '22 21:09 HiassofT

I did some more tests, again with CCACHE_NOCOMPRESS=1. First a complete clean build (ccache nuked) and then 2 clean rebuilds (with ccache intact), both with master and this PR plus the -Wa revert.

Then I did another test, using this PR but switched ccache back to 3.7, used the package.mk from master but kept PKG_DEPENDS_HOST as in this PR to measure the impact of the added dependencies. I didn't do a full clean/rebuild/rebuild test here but simply did 2 clean rebuilds using the pre-populated 3.7 cache.

Here are the "time" outputs of make from full-clean and second rebuild

ccache 3.7.12:

initial:
real    121m32.320s
user    822m35.934s
sys     78m44.177s

rebuild2:
real    22m28.057s
user    107m52.215s
sys     17m45.735s

ccache 4.6.3

initial:
real    125m50.153s
user    840m37.276s
sys     84m1:1.576s

rebuild2:
real    30m9.614s
user    149m38.462s
sys     20m31.692s

this PR/deps with ccache 3.7:

rebuild4-with-4.6-deps:
real    29m31.610s
user    145m22.913s
sys     20m1.017s

So the initial build took about 4.5minutes (~4%) longer with 4.6. Second rebuild took about 7.5 minutes (/33%) longer compared to 3.7.

The 3.7-with-4.6-deps rebuild was only 30 seconds faster than the 4.6 rebuild result - which indicates that libzstd and especially cmale are the hefty dependencies that kill rebuild performance.

I'm somewhat inclined to simply make cmake a build host dependency, so zstd:host and ccache:host can be built right at the beginning, using cmake from build host so we have ccache available for cmake, gettext, pkg-config etc host builds.

I haven't tested yet but I guess that might improve rebuild performance a lot.

HiassofT avatar Sep 19 '22 21:09 HiassofT

I did another test analyzing the critical chain at image build start.

dependency plan up to ccache looks like this:

build   gettext:host              (wants: )
build   pkg-config:host           (wants: gettext:host)
build   cmake:host                (wants: pkg-config:host)
build   make:host                 (wants: )
build   zstd:host                 (wants: cmake:host, make:host)
build   ccache:host               (wants: cmake:host, make:host, zstd:host)

so the critical chain is gettext, make->pkgconfig->cmake->zstd->ccache

Here are results from separately building each package:

time ../run-rpi4.sh scripts/build make:host
real    0m8.829s
user    0m12.833s
sys     0m2.314s

time ../run-rpi4.sh scripts/build gettext:host
real    2m23.918s
user    3m27.409s
sys     0m30.713s

time ../run-rpi4.sh scripts/build pkg-config:host
real    0m17.252s
user    0m38.762s
sys     0m5.393s

time ../run-rpi4.sh scripts/build cmake:host
real    4m54.428s
user    32m41.799s
sys     1m40.192s

time ../run-rpi4.sh scripts/build zstd:host
real    0m10.973s
user    1m0.238s
sys     0m1.864s

time ../run-rpi4.sh scripts/build ccache:host
real    0m40.429s
user    3m20.995s
sys     0m9.115s

cmake is by far the largest contributor, gettext needs quite a lot of time, too (mainly in configure stage). zstd build is quick (but needs cmake) and ccache is rather quick to build, too

HiassofT avatar Sep 20 '22 08:09 HiassofT

I have this some time on hold and it is perfect time to get dust off. When package is bumped ccache is not used anymore because the absolute path is changed. So I was thinking about solution like this below. It sets CCACHE_BASEDIR environment variable to the package build folder which means that ccache is used even on package bumps. What do you think? On quick reading I don't see any drawbacks.

https://ccache.dev/manual/4.3.html#config_base_dir https://ccache.dev/manual/4.3.html#_compiling_in_different_directories


diff --git a/scripts/build b/scripts/build
index edc47ff20d..3c7b1272a1 100755
--- a/scripts/build
+++ b/scripts/build
@@ -221,6 +221,9 @@ mkdir -p "${PKG_BUILD}"
 
 cd "${PKG_BUILD}"
 
+# share ccache between package bumps
+export CCACHE_BASEDIR=${PKG_BUILD}
+
 if [ -f "${PKG_CONFIGURE_SCRIPT}" -o -f "${PKG_CMAKE_SCRIPT}" -o -f "${PKG_MESON_SCRIPT}" ]; then
   case "${TARGET}" in
     "target")    PKG_REAL_BUILD="${PKG_BUILD}/.${TARGET_NAME}" ;;

I’ve added this to my dev tree. I haven’t done any formal testing - but haven’t seen any badness yet. I’ll put something (test harness) together to start capturing the performance data and testing the improvements using this for the bumps.

heitbaum avatar Sep 20 '22 11:09 heitbaum

The CCACHE_BASEDIR change looks quite dangerous to me.

I use an external CCACHE_DIR, shared by multiple projects/devices of an LE branch and if I understood the docs this might lead to eg AW build picking up a cached object of an RPi build.

If we want such functionality it's best to make it optional (this one probably would require a pre-build hook where user options are sourced).

In general I'd like to be LE default configuration to be as much stock and conservative as possible - aggressive optimizations should only be applied by users/devs at their own risk :-)

HiassofT avatar Sep 20 '22 12:09 HiassofT

The CCACHE_BASEDIR change looks quite dangerous to me.

I use an external CCACHE_DIR, shared by multiple projects/devices of an LE branch and if I understood the docs this might lead to eg AW build picking up a cached object of an RPi build.

If we want such functionality it's best to make it optional (this one probably would require a pre-build hook where user options are sourced).

In general I'd like to be LE default configuration to be as much stock and conservative as possible - aggressive optimizations should only be applied by users/devs at their own risk :-)

I’m interested in getting a better understanding on how the “magic” work - especially your shared ccache model across the different target and exactly the ccache logic handles the change of the compiler. Would like to see that the performance difference is there. Also if there is an option to use the full path BASEDIR (but without the -version) so that ABC/main.c is not DEF/main.c

heitbaum avatar Sep 20 '22 12:09 heitbaum

Another option to think of is CCACHE_NOHASHDIR="1". Openembedded has it set: https://github.com/openembedded/openembedded-core/blob/master/meta/classes/ccache.bbclass#L50

The point of this proposed changes is what I already wrote: when you bump package the ccache for it is not used anymore because absolute path has been changed. Even if new package sources has only 1 file changed and everything else is the same everything is build without ccache. This looks wrong to me.

But I'm not saying anything like this needs to be implemented. But if it works why not. We need to be conservative with resources this days :)

vpeter4 avatar Sep 20 '22 14:09 vpeter4

There's no magic here, I simply set CCACHE_DIR=~/ccache-master, CCACHE_DIR=~/ccache-le10 or something like that.

That's basically the default ccache config (where it's using ~/.ccache or ~/.cache/ccache) - ccache will associate cache entries with the dir so eg ~/libreelec-10 and ~/libreelec-master won't re-use each other's cache netries - they only share a single cache.

HiassofT avatar Sep 20 '22 15:09 HiassofT

Cache hit rate when updating linux, using stock ccache config (so no basedir etc) was actually quite good.

RPi4 kernel update from 5.15.67-3157603 to 5.15.68-ac7d1fd needed a full rebuild, probably because a new config options was added (CONFIG_ARM64_ERRATUM_2457168) - in that case it'd expect (almost) all of the source to be rebuilt though.

Then updating from 5.15.68-ac7d1fd to 5.15.68-b11af53 (which changed vc4 driver and header) was rather quick. "make" took about 3 minutes and cache hit rate was about 90%

real    2m56.479s
user    16m10.769s
sys     2m7.951s
Summary:
  Cache directory:                /home/hias/ccache-master
  Primary config:                 /home/hias/ccache-master/ccache.conf
  Secondary config:               /home/hias/private/libreelec/libreelec-master/build.LibreELEC-RPi4.arm-11.0-devel/toolchain/etc/ccache.conf
  Stats updated:                  Tue Sep 20 18:06:54 2022
  Hits:                             4768 /  5291 (90.12 %)
    Direct:                         4760 /  5334 (89.24 %)
    Preprocessed:                      8 /   535 (1.50 %)
  Misses:                            523
    Direct:                          574
    Preprocessed:                    527
  Uncacheable:                       563
Primary storage:
  Hits:                             9533 / 10628 (89.70 %)
  Misses:                           1095
  Cache size (GB):                  6.59 / 40.00 (16.46 %)
  Files:                          126196
Uncacheable:
  Called for linking:                 85
  Called for preprocessing:          378
  Compilation failed:                  2
  Compiler produced empty output:      2
  No input file:                      57
  Preprocessing failed:               35
  Unsupported code directive:          4

Then cleaning image and running make again took about 1min 20 (about 30 seconds from that are squashfs though) and cache hit rate was almost 100%

Total Build Time: 00:00:51.675 (wall clock)
real    1m17.366s
user    5m2.212s
sys     1m8.107s
Summary:
  Cache directory:                /home/hias/ccache-master
  Primary config:                 /home/hias/ccache-master/ccache.conf
  Secondary config:               /home/hias/private/libreelec/libreelec-master/build.LibreELEC-RPi4.arm-11.0-devel/toolchain/etc/ccache.conf
  Stats updated:                  Tue Sep 20 18:13:27 2022
  Hits:                             5019 /  5022 (99.94 %)
    Direct:                         5012 /  5057 (99.11 %)
    Preprocessed:                      7 /    14 (50.00 %)
  Misses:                              3
    Direct:                           45
    Preprocessed:                      7
  Uncacheable:                       546
Primary storage:
  Hits:                            10034 / 10083 (99.51 %)
  Misses:                             49
  Cache size (GB):                  6.59 / 40.00 (16.46 %)
  Files:                          126201
Uncacheable:
  Called for linking:                 84
  Called for preprocessing:          378
  Compilation failed:                  2
  Compiler produced empty output:      2
  No input file:                      49
  Preprocessing failed:               27
  Unsupported code directive:          4

HiassofT avatar Sep 20 '22 16:09 HiassofT

Then looks like something is wrong how I tested.

vpeter4 avatar Sep 20 '22 17:09 vpeter4

Updated to ccache-4.7, erroring with:

  • https://github.com/ccache/ccache/issues/1189

Build can be completed by running:

  • make image
  • CCACHE_DISABLE=1 scripts/build nettle
  • make image

heitbaum avatar Oct 19 '22 09:10 heitbaum

Closing as this is superseded by

  • #7202

heitbaum avatar Dec 04 '22 12:12 heitbaum