dub
dub copied to clipboard
Speed up configuration resolution
This speeds up the getPackageConfigs call issued by getDefaultConfiguration from 14.8 ms to 3.6 ms for the particular project that I'm benchmarking. There is still a lot more to gain, but this is a good start.
✅ PR OK, no changes in deprecations or warnings
Total deprecations: 0
Total warnings: 0
Build statistics:
statistics (-before, +after)
-executable size=5364248 bin/dub
-rough build time=63s
+executable size=5376664 bin/dub
+rough build time=64s
Full build output
DUB version 1.36.0, built on Mar 3 2024
LDC - the LLVM D compiler (1.37.0):
based on DMD v2.107.1 and LLVM 17.0.6
built with LDC - the LLVM D compiler (1.37.0)
Default target: x86_64-unknown-linux-gnu
Host CPU: znver3
http://dlang.org - http://wiki.dlang.org/LDC
Registered Targets:
aarch64 - AArch64 (little endian)
aarch64_32 - AArch64 (little endian ILP32)
aarch64_be - AArch64 (big endian)
amdgcn - AMD GCN GPUs
arm - ARM
arm64 - ARM64 (little endian)
arm64_32 - ARM64 (little endian ILP32)
armeb - ARM (big endian)
avr - Atmel AVR Microcontroller
bpf - BPF (host endian)
bpfeb - BPF (big endian)
bpfel - BPF (little endian)
hexagon - Hexagon
lanai - Lanai
loongarch32 - 32-bit LoongArch
loongarch64 - 64-bit LoongArch
mips - MIPS (32-bit big endian)
mips64 - MIPS (64-bit big endian)
mips64el - MIPS (64-bit little endian)
mipsel - MIPS (32-bit little endian)
msp430 - MSP430 [experimental]
nvptx - NVIDIA PTX 32-bit
nvptx64 - NVIDIA PTX 64-bit
ppc32 - PowerPC 32
ppc32le - PowerPC 32 LE
ppc64 - PowerPC 64
ppc64le - PowerPC 64 LE
r600 - AMD GPUs HD2XXX-HD6XXX
riscv32 - 32-bit RISC-V
riscv64 - 64-bit RISC-V
sparc - Sparc
sparcel - Sparc LE
sparcv9 - Sparc V9
spirv32 - SPIR-V 32-bit
spirv64 - SPIR-V 64-bit
systemz - SystemZ
thumb - Thumb
thumbeb - Thumb (big endian)
ve - VE
wasm32 - WebAssembly 32-bit
wasm64 - WebAssembly 64-bit
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
xcore - XCore
Upgrading project in /home/runner/work/dub/dub/
Starting Performing "release" build using /opt/hostedtoolcache/dc/ldc2-1.37.0/x64/ldc2-1.37.0-linux-x86_64/bin/ldc2 for x86_64.
Building dub 1.37.0-rc.1+commit.41.gbfdf300a: building configuration [application]
Linking dub
STAT:statistics (-before, +after)
STAT:executable size=5376664 bin/dub
STAT:rough build time=64s
Added two more optimizations to get down to 3.1 ms.
Now down to 2.6 ms.
I have a concern about code complexity. I found the original function hard to understand; it took me a couple of hours of summarising it in English before I understood what it did well enough to be able to optimise it. This PR makes even more complicated.
Having said that, I tried this PR on the pathological case that motivated my own PR and the runtime for getPackageConfigs went from ~290ms to ~90ms. The "problem" is that it's still death by a thousand cuts. Packages with fewer dependencies don't take that long to begin with so our total runtime went from 15s to 12s.
I have a concern about code complexity. I found the original function hard to understand; it took me a couple of hours of summarising it in English before I understood what it did well enough to be able to optimise it. This PR makes even more complicated.
I agree with that. Part of it is based on the unfortunate design choice of making Package.recipe and Package.rawRecipe mutable references, which is something that would be nice to remedy at some point - and then ultimately reducing the complexity here again. Other parts could be improved by introducing new container types (Set!T, ListSet!T), but in the end this is an algorithmic complexity issue and not a code style issue. Also, the code can be refactored for more clarity for sure, but that is a different topic, as that would target the high-level structure (which is not changed by this PR).
Having said that, I tried this PR on the pathological case that motivated my own PR and the runtime for
getPackageConfigswent from ~290ms to ~90ms. The "problem" is that it's still death by a thousand cuts. Packages with fewer dependencies don't take that long to begin with so our total runtime went from 15s to 12s.
Without being able to reproduce your pathological case or more precise information, I cannot help you there. You identified this particular piece of code to be responsible for a considerable slowdown, so I went head and optimized it. But apart from that, a 20% total improvement is not what I would associate with "death by a thousand cuts".
You identified this particular piece of code to be responsible for a considerable slowdown, so I went head and optimized it
Yes, and thanks!
a 20% total improvement is not what I would associate with "death by a thousand cuts".
Normally, no, but the majority of the runtime is still in getPackageConfigs. It gets called twice to make things worse, once to add the test runner configuration and another in the generator when getting the build information (so with different arguments). Our build has 80 dub packages, which means death by 160 cuts. If each "cut" is ~50ms on average it means an added runtime of 8s.
Normally, no, but the majority of the runtime is still in getPackageConfigs. It gets called twice to make things worse, once to add the test runner configuration and another in the generator when getting the build information (so with different arguments). Our build has 80 dub packages, which means death by 160 cuts. If each "cut" is ~50ms on average it means an added runtime of 8s.
But no matter how often getPackageConfigs gets called, a 2x improvement stays a 2x improvement. The death by a thousand cuts would IMO apply more to a situation where a thousand different places each contribute a thousandth of the runtime.
Just to quickly get the numbers straight here:
- The overall number for you improved by roughly 25% - assuming that
getPackageConfigstakes up 50% of the total time after the optimization, that would mean a speedup forgetPackageConfigsof only 1.5x - Assuming the 160 calls per total runtime is true, that means with the optimized version a single call takes about 38 ms on average (15x more than for my benchmark project)
- Your pathological test case improved by roughly 3.2x
- In my test case, I'm getting an improvement of around 5.7x
- Applying the 3.2x factor above to estimate the overall runtime taken by
getPackageConfigswould result in just 29% of the total runtime of 15 s (11% afterwards for the 12 s)
What seems odd here is that, speaking of the pathological test case, you both seem to have a larger payload (assuming the single-thread performance of the machine is not much lower) and a considerably worse speedup at the same time. I would usually expect the opposite to be true and it would be interesting to find out why that happens.
On the other hand, if I should bet, I would say that the reality is that the majority of the runtime is actually spent somewhere else. Taking the estimation from the last bullet point, this would mean that optimizing getPackageConfigs alone will never result in less than maybe 11 s. Did you measure absolute time, or is this a statistical measurement from a sampling profiler (which can be heavily skewed)?