Are precompiled headers worthwhile?
I timed building the client, and cgame+sgame DLL targets with the following commands.
for x in 1 2 3 4 5; do make clean; time(make -j15 client > /dev/null 2> /dev/null); done
for x in 1 2 3 4 5; do make clean; time(make -j15 cgame-native-dll sgame-native-dll > /dev/null 2> /dev/null); done
| PCH status | Client min time (s) | Client max time (s) | Gamelogic min time (s) | Gamelogic max time (s) |
|---|---|---|---|---|
| PCH off | 16.6 | 17.0 | 32.0 | 32.4 |
| PCH on | 12.9 | 13.0 | 24.8 | 25.0 |
So a 20-25% speedup. Seems not really worth it?
I don't have much opinion on that, if it can make the build faster it's welcome, to me 25% looks very good when the computer running isn't a very fast computer.
Of course with my ThreadRipper I probably never felt the difference, but not anyone has such beast. 15~25 minutes for 5 build means 3~5 minutes per build, which is very fast. People achieving 3~5 minutes per build (like I do) are likely not the target users of PCH.
Ah no I misread the table, it's not 25 minutes for 5 builds but 25 seconds per build, 5 times… Well. 25 seconds looks very fast.
Also I don't know why I wrongly remembered a full build was lasting 5 minutes on my end, maybe I counted all engines + nacl games (which is not what is benchmarked here).
I did some more benchmarks, with a large diversity of hardware, I used those commands:
for x in {1..5}; do make clean; time (make -j"$(nproc)" client >/dev/null 2>&1); done
for x in {1..5}; do make clean; time (make -j"$(nproc)" cgame-native-dll sgame-native-dll >/dev/null 2>&1); done
- AMD Ryzen Threadripper PRO 3955WX, (amd64, 16c32t, Zen2):
| PCH status | Client min time (s) | Client max time (s) | Gamelogic min time (s) | Gamelogic max time (s) |
|---|---|---|---|---|
| PCH off | 19.9 | 21.3 | 32.9 | 34.4 |
| PCH on | 18.0 | 19.1 | 26.7 | 27.3 |
- AMD Ryzen Z1 Extreme (amd64, 8c16t, Zen4):
| PCH status | Client min time (s) | Client max time (s) | Gamelogic min time (s) | Gamelogic max time (s) |
|---|---|---|---|---|
| PCH off | 30.4 | 32.6 | 1m2.0 | 1m10.0 |
| PCH on | 25.7 | 30.5 | 48.2 | 1m2.3 |
- AMD Ryzen 3 3200G (amd64, 4c4t, Zen+):
| PCH status | Client min time (s) | Client max time (s) | Gamelogic min time (s) | Gamelogic max time (s) |
|---|---|---|---|---|
| PCH off | 1m24.7 | 1m35.7 | 2m47.7 | 2m56.0 |
| PCH on | 1m3.3 | 1m8.8 | 2m4.3 | 2m15.1 |
- Intel Core i7-6500U (amd64, 2c4t, Skylake):
| PCH status | Client min time (s) | Client max time (s) | Gamelogic min time (s) | Gamelogic max time (s) |
|---|---|---|---|---|
| PCH off | 2m33.0 | 2m39.9 | 5m7.0 | 5m26.2 |
| PCH on | 1m44.6 | 1m45.6 | 3m36.8 | 3m40.8 |
- Intel Core i7-4810MQ (amd64, 4c8t, Haswell):
| PCH status | Client min time (s) | Client max time (s) | Gamelogic min time (s) | Gamelogic max time (s) |
|---|---|---|---|---|
| PCH off | 1m43.8 | 1m56.1 | 3m43.2 | 3m59.7 |
| PCH on | 56.5 | 1m1.6 | 2m16.2 | 2m24.8 |
- Intel Core 2 Duo L7500 (amd64, 1c2t, Merom):
| PCH status | Client min time (s) | Client max time (s) | Gamelogic min time (s) | Gamelogic max time (s) |
|---|---|---|---|---|
| PCH off | 7m33.7 | 7m47.8 | 15m20.8 | 18m11.0 |
| PCH on | 5m9.9 | 5m31.8 | 10m49.9 | 11m9.8 |
- Amlogic S905X3 (arm64, 4c8t, Cortex-A55):
| PCH status | Client min time (s) | Client max time (s) | Gamelogic min time (s) | Gamelogic max time (s) |
|---|---|---|---|---|
| PCH off | 1m49,2 | 1m56.3 | 3m34.3 | 3m39.8 |
| PCH on | 1m17.4 | 1m19.8 | 2m33.2 | 2m40.8 |
So, with hardware allowing less than 16 jobs per CPU, we start to see a significant difference between PCH disabled or enabled.