Turbocharging your build with Mold?
https://github.com/rui314/mold
https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html#Link-Options
Conclusions; TL;DR Outline:
- Mold is a fast and relatively new linker that claims to be about 30x faster than gnu gold https://github.com/rui314/mold/blob/main/docs/comparison.png
- How much impact you'll see in your actual build depends on how much time of your build time is spent on the link step(s) and how much your builds are bottle-necked by your link steps
- (as an aside, I have spend much time optimizing CMake false dependencies builds so that I can parallelize large links with the rest of builds to mitigate this impact)
- In my testing on a project I have spent time with, I see about 3.5x improvement for the largest link time specifically.
- Huge memory usage difference (31x)
Debug builds:
Full executable path is: /home/jason/EnergyPlus/build-gold/Products/libenergyplusapi.so.22.1.0
Setting up Python API, creating pyenergyplus package at /home/jason/EnergyPlus/build-gold/Products/pyenergyplus
18.88user 1.86system 0:20.77elapsed 99%CPU (avgtext+avgdata 3770952maxresident)k
0inputs+1469056outputs (178156major+529082minor)pagefaults 0swaps
Full executable path is: /home/jason/EnergyPlus/build-mold/Products/libenergyplusapi.so.22.1.0
Setting up Python API, creating pyenergyplus package at /home/jason/EnergyPlus/build-mold/Products/pyenergyplus
2.68user 0.52system 0:06.05elapsed 52%CPU (avgtext+avgdata 121340maxresident)k
0inputs+438000outputs (0major+68089minor)pagefaults 0swaps
Interestingly, mold consistently produces larger binaries before stripping, and smaller binaries after stripping. The bulk of the difference is in the symbol and string tables. My investigation implies that it's bad (or at least worse than gold) at deciding which symbols and strings need to be included.
Mold Debug vs Gold Debug
FILE SIZE VM SIZE
-------------- -------------
+23% +3.80Mi +23% +3.80Mi .rodata
[NEW] +3.01Mi [NEW] +3.01Mi .data.rel.ro.local
+4.5% +161Ki +4.5% +161Ki .rela.plt
+10% +150Ki +10% +150Ki .gnu.hash
+4.5% +107Ki +4.5% +107Ki .plt
+1.4% +82.8Ki +1.4% +82.8Ki .eh_frame
+4.5% +53.7Ki +4.5% +53.7Ki .got.plt
[NEW] +669 [NEW] +669 [LOAD #1 [RX]]
+876% +438 [ = ] 0 [Unmapped]
[DEL] -160 [DEL] -96 .rodata.cst16
-0.0% -195 +0.0% +52 [19 Others]
[DEL] -698 [DEL] -698 [LOAD #2 [R]]
[DEL] -856 [DEL] -792 .plt.got
-0.1% -1.53Ki -0.1% -1.53Ki .gcc_except_table
[ = ] 0 [DEL] -3.18Ki .relro_padding
[DEL] -17.7Ki [DEL] -17.6Ki .rodata.cst8
[DEL] -170Ki [DEL] -170Ki .rodata.str1.1
-88.6% -3.01Mi -88.6% -3.01Mi .data.rel.ro
[DEL] -3.61Mi [DEL] -3.61Mi .rodata.str1.8
-50.2% -8.08Mi [ = ] 0 .symtab
-57.7% -37.2Mi [ = ] 0 .strtab
-6.0% -44.8Mi +0.4% +549Ki TOTAL
mold seems to just be worse about exporting unnecessary symbols into the .so file
After stripping:
FILE SIZE VM SIZE
-------------- -------------
+23% +3.80Mi +23% +3.80Mi .rodata
[NEW] +3.01Mi [NEW] +3.01Mi .data.rel.ro.local
+4.5% +161Ki +4.5% +161Ki .rela.plt
+10% +150Ki +10% +150Ki .gnu.hash
+4.5% +107Ki +4.5% +107Ki .plt
+1.4% +82.8Ki +1.4% +82.8Ki .eh_frame
+4.5% +53.7Ki +4.5% +53.7Ki .got.plt
[NEW] +669 [NEW] +669 [LOAD #1 [RX]]
+102% +440 [ = ] 0 [Unmapped]
+0.0% +129 +0.0% +154 .text
-0.0% -102 -0.0% -102 [16 Others]
[DEL] -160 [DEL] -96 .rodata.cst16
[DEL] -212 [ = ] 0 .gnu_debuglink
[DEL] -698 [DEL] -698 [LOAD #2 [R]]
[DEL] -856 [DEL] -792 .plt.got
-0.1% -1.53Ki -0.1% -1.53Ki .gcc_except_table
[ = ] 0 [DEL] -3.18Ki .relro_padding
[DEL] -17.7Ki [DEL] -17.6Ki .rodata.cst8
[DEL] -170Ki [DEL] -170Ki .rodata.str1.1
-88.6% -3.01Mi -88.6% -3.01Mi .data.rel.ro
[DEL] -3.61Mi [DEL] -3.61Mi .rodata.str1.8
+0.4% +552Ki +0.4% +549Ki TOTAL
Final conclusion
- Yes, it's faster.
- Will it change your life? Depends on how much time you spend waiting for linking.
- It seems to consistently generate larger executables - if this matters to you
- It's easy to use with GCC (built in linker support for it)
- It's easy to use with CMake if you use add_link_options to enable the
-fuse-ld=flag
Raw Data
Time for gold, lto-gold, mold, lto-mold plus info for binary sizes and performance timing comparisons
gold:
[630/630] Linking CXX static library Products/libenergypluslib.a
8000.18user 271.75system 14:25.63elapsed 955%CPU (avgtext+avgdata 2932752maxresident)k
760inputs+1527552outputs (49major+6614729minor)pagefaults 0swaps
[10/12] Creating library symlink Products/libenergyplusapi.so
Full executable path is: /home/jason/EnergyPlus/build-gold/Products/libenergyplusapi.so.22.1.0
Setting up Python API, creating pyenergyplus package at /home/jason/EnergyPlus/build-gold/Products/pyenergyplus
[12/12] Creating executable symlink Products/energyplus
65.48user 4.07system 0:26.45elapsed 262%CPU (avgtext+avgdata 1676608maxresident)k
392inputs+175072outputs (16114major+1340564minor)pagefaults 0swaps
[658/658] Creating executable symlink Products/energyplus
8259.89user 296.32system 14:47.13elapsed 964%CPU (avgtext+avgdata 2932904maxresident)k
12664inputs+1731048outputs (17836major+68047521minor)pagefaults 0swaps
Debug Link only
Full executable path is: /home/jason/EnergyPlus/build-gold/Products/libenergyplusapi.so.22.1.0
Setting up Python API, creating pyenergyplus package at /home/jason/EnergyPlus/build-gold/Products/pyenergyplus
18.88user 1.86system 0:20.77elapsed 99%CPU (avgtext+avgdata 3770952maxresident)k
0inputs+1469056outputs (178156major+529082minor)pagefaults 0swaps
gold-lto
[630/630] Linking CXX static library Products/libenergypluslib.a
3826.38user 232.45system 7:01.62elapsed 962%CPU (avgtext+avgdata 2427000maxresident)k
0inputs+3859456outputs (50major+57780733minor)pagefaults 0swaps
[10/12] Creating library symlink Products/libenergyplusapi.so
Full executable path is: /home/jason/EnergyPlus/build-lto-gold/Products/libenergyplusapi.so.22.1.0
Setting up Python API, creating pyenergyplus package at /home/jason/EnergyPlus/build-lto-gold/Products/pyenergyplus
[12/12] Creating executable symlink Products/energyplus
2369.78user 41.09system 4:52.43elapsed 824%CPU (avgtext+avgdata 1691540maxresident)k
[658/658] Creating executable symlink Products/energyplus
6348.26user 286.18system 11:56.58elapsed 925%CPU (avgtext+avgdata 2426908maxresident)k
4528inputs+4086904outputs (16186major+69344158minor)pagefaults 0swaps
mold
[630/630] Linking CXX static library Products/libenergypluslib.a
8071.03user 269.10system 14:30.77elapsed 957%CPU (avgtext+avgdata 2932880maxresident)k
8984inputs+1527384outputs (77major+66138307minor)pagefaults 0swaps
[10/12] Creating library symlink Products/libenergyplusapi.so
Full executable path is: /home/jason/EnergyPlus/build-mold/Products/libenergyplusapi.so.22.1.0
Setting up Python API, creating pyenergyplus package at /home/jason/EnergyPlus/build-mold/Products/pyenergyplus
[12/12] Creating executable symlink Products/energyplus
57.59user 3.45system 0:23.48elapsed 259%CPU (avgtext+avgdata 167572maxresident)k
392inputs+488880outputs (8major+1300878minor)pagefaults 0swaps
[658/658] Creating executable symlink Products/energyplus
8283.41user 283.24system 14:48.95elapsed 963%CPU (avgtext+avgdata 2933712maxresident)k
1424inputs+1589072outputs (17major+67988849minor)pagefaults 0swaps
Debug Link only
Full executable path is: /home/jason/EnergyPlus/build-mold/Products/libenergyplusapi.so.22.1.0
Setting up Python API, creating pyenergyplus package at /home/jason/EnergyPlus/build-mold/Products/pyenergyplus
2.68user 0.52system 0:06.05elapsed 52%CPU (avgtext+avgdata 121340maxresident)k
0inputs+438000outputs (0major+68089minor)pagefaults 0swaps
mold-lto
[630/630] Linking CXX static library Products/libenergypluslib.a
3825.81user 232.21system 7:02.29elapsed 960%CPU (avgtext+avgdata 2427260maxresident)k
8inputs+3902728outputs (8major+57745549minor)pagefaults 0swaps
[10/12] Creating library symlink Products/libenergyplusapi.so
Full executable path is: /home/jason/EnergyPlus/build-lto-mold/Products/libenergyplusapi.so.22.1.0
Setting up Python API, creating pyenergyplus package at /home/jason/EnergyPlus/build-lto-mold/Products/pyenergyplus
[12/12] Creating executable symlink Products/energyplus
48.87user 3.45system 4:50.35elapsed 18%CPU (avgtext+avgdata 1691508maxresident)k
392inputs+515760outputs (1major+1287719minor)pagefaults 0swaps
full build:
[658/658] Creating executable symlink Products/energyplus
3936.56user 256.16system 11:51.28elapsed 589%CPU (avgtext+avgdata 2426960maxresident)k
0inputs+3914232outputs (17major+59543380minor)pagefaults 0swaps
Mold Debug vs Gold Debug
FILE SIZE VM SIZE
-------------- -------------
+23% +3.80Mi +23% +3.80Mi .rodata
[NEW] +3.01Mi [NEW] +3.01Mi .data.rel.ro.local
+4.5% +161Ki +4.5% +161Ki .rela.plt
+10% +150Ki +10% +150Ki .gnu.hash
+4.5% +107Ki +4.5% +107Ki .plt
+1.4% +82.8Ki +1.4% +82.8Ki .eh_frame
+4.5% +53.7Ki +4.5% +53.7Ki .got.plt
[NEW] +669 [NEW] +669 [LOAD #1 [RX]]
+876% +438 [ = ] 0 [Unmapped]
[DEL] -160 [DEL] -96 .rodata.cst16
-0.0% -195 +0.0% +52 [19 Others]
[DEL] -698 [DEL] -698 [LOAD #2 [R]]
[DEL] -856 [DEL] -792 .plt.got
-0.1% -1.53Ki -0.1% -1.53Ki .gcc_except_table
[ = ] 0 [DEL] -3.18Ki .relro_padding
[DEL] -17.7Ki [DEL] -17.6Ki .rodata.cst8
[DEL] -170Ki [DEL] -170Ki .rodata.str1.1
-88.6% -3.01Mi -88.6% -3.01Mi .data.rel.ro
[DEL] -3.61Mi [DEL] -3.61Mi .rodata.str1.8
-50.2% -8.08Mi [ = ] 0 .symtab
-57.7% -37.2Mi [ = ] 0 .strtab
-6.0% -44.8Mi +0.4% +549Ki TOTAL
mold seems to just be worse about exporting unnecessary symbols into the .so file
After stripping:
FILE SIZE VM SIZE
-------------- -------------
+23% +3.80Mi +23% +3.80Mi .rodata
[NEW] +3.01Mi [NEW] +3.01Mi .data.rel.ro.local
+4.5% +161Ki +4.5% +161Ki .rela.plt
+10% +150Ki +10% +150Ki .gnu.hash
+4.5% +107Ki +4.5% +107Ki .plt
+1.4% +82.8Ki +1.4% +82.8Ki .eh_frame
+4.5% +53.7Ki +4.5% +53.7Ki .got.plt
[NEW] +669 [NEW] +669 [LOAD #1 [RX]]
+102% +440 [ = ] 0 [Unmapped]
+0.0% +129 +0.0% +154 .text
-0.0% -102 -0.0% -102 [16 Others]
[DEL] -160 [DEL] -96 .rodata.cst16
[DEL] -212 [ = ] 0 .gnu_debuglink
[DEL] -698 [DEL] -698 [LOAD #2 [R]]
[DEL] -856 [DEL] -792 .plt.got
-0.1% -1.53Ki -0.1% -1.53Ki .gcc_except_table
[ = ] 0 [DEL] -3.18Ki .relro_padding
[DEL] -17.7Ki [DEL] -17.6Ki .rodata.cst8
[DEL] -170Ki [DEL] -170Ki .rodata.str1.1
-88.6% -3.01Mi -88.6% -3.01Mi .data.rel.ro
[DEL] -3.61Mi [DEL] -3.61Mi .rodata.str1.8
+0.4% +552Ki +0.4% +549Ki TOTAL
LTO release comparison:
FILE SIZE VM SIZE
-------------- -------------
+19% +1.50Mi +19% +1.50Mi .rodata
[NEW] +178Ki [NEW] +178Ki .data.rel.ro.local
+4.1% +56.6Ki +4.1% +56.6Ki .eh_frame
+38% +40.0Ki +38% +40.0Ki .gnu.hash
+16% +31.1Ki +16% +31.1Ki .rela.plt
+16% +20.8Ki +16% +20.8Ki .plt
+16% +10.4Ki +16% +10.4Ki .got.plt
+123% +2.83Ki [ = ] 0 [Unmapped]
-0.0% -730 -0.0% -278 [22 Others]
-0.1% -1.01Ki -0.1% -1.01Ki .gcc_except_table
-1.4% -2.76Ki -1.4% -2.76Ki .eh_frame_hdr
[ = ] 0 [DEL] -2.76Ki .relro_padding
-0.6% -4.97Ki -0.6% -4.97Ki .rela.dyn
[DEL] -93.9Ki [DEL] -93.9Ki .rodata.cst8
-0.3% -96.1Ki -0.3% -96.1Ki .text
[DEL] -121Ki [DEL] -121Ki .rodata.cst16
-54.2% -180Ki -54.2% -180Ki .data.rel.ro
[DEL] -206Ki [DEL] -206Ki .rodata.str1.1
[DEL] -1.09Mi [DEL] -1.09Mi .rodata.str1.8
-26.3% -1.18Mi [ = ] 0 .strtab
-63.4% -1.66Mi [ = ] 0 .symtab
-4.7% -2.78Mi +0.1% +47.6Ki TOTAL
after stripping:
FILE SIZE VM SIZE
-------------- -------------
+19% +1.50Mi +19% +1.50Mi .rodata
[NEW] +178Ki [NEW] +178Ki .data.rel.ro.local
+4.1% +56.6Ki +4.1% +56.6Ki .eh_frame
+38% +40.0Ki +38% +40.0Ki .gnu.hash
+16% +31.1Ki +16% +31.1Ki .rela.plt
+16% +20.8Ki +16% +20.8Ki .plt
+16% +10.4Ki +16% +10.4Ki .got.plt
+213% +2.84Ki [ = ] 0 [Unmapped]
[NEW] +656 [NEW] +656 [LOAD #1 [RX]]
-0.0% -654 -0.0% -212 [20 Others]
[DEL] -722 [DEL] -722 [LOAD #2 [R]]
-0.1% -1.01Ki -0.1% -1.01Ki .gcc_except_table
-1.4% -2.76Ki -1.4% -2.76Ki .eh_frame_hdr
[ = ] 0 [DEL] -2.76Ki .relro_padding
-0.6% -4.97Ki -0.6% -4.97Ki .rela.dyn
[DEL] -93.9Ki [DEL] -93.9Ki .rodata.cst8
-0.3% -96.1Ki -0.3% -96.1Ki .text
[DEL] -121Ki [DEL] -121Ki .rodata.cst16
-54.2% -180Ki -54.2% -180Ki .data.rel.ro
[DEL] -206Ki [DEL] -206Ki .rodata.str1.1
[DEL] -1.09Mi [DEL] -1.09Mi .rodata.str1.8
-26.3% -1.18Mi [ = ] 0 .strtab
-63.4% -1.66Mi [ = ] 0 .symtab
+0.1% +52.5Ki +0.1% +47.6Ki TOTAL
Coming in Ep417