mold
mold copied to clipboard
slow linking when used at WSL2
While trying to move our distribution (https://github.com/LibreELEC/LibreELEC.tv/pull/6875) to mold some speed regressions turned up. We investigated where they relate to and did some basic test. I tested it at WSL2 and compared it to native Linux at the same machine (8/16 core Ryzen 7) . Turned out that linking at WSL2 is a lot slower compared to native Linux while Gold and bfd is in similar ballpark across the OS.
I use WSL2 to compile the whole distribution for a while now and never saw any significant speed regressions (run at native ext4 not NTFS).
cat do-test
#!/bin/sh
if [ "$#" -lt 1 ]; then
echo "usage: $0 count"
exit 1
fi
MAX=$1
shift
IN="${XDG_RUNTIME_DIR}/t.c"
OBJ="${XDG_RUNTIME_DIR}/t.o"
OUT="${XDG_RUNTIME_DIR}/t"
cat << EOF > "${IN}"
#include <stdio.h>
int main(int argc, char** argv)
{
printf("hello, world!\n");
return 0;
}
EOF
gcc -c -o "${OBJ}" "$IN"
# args: message, args...
do_link() {
count=0
start="$(date +%s.%3N)"
while [ $count -lt ${MAX} ]; do
gcc "$@" -o "${OUT}" "${OBJ}"
count=$((count+1))
done
end="$(date +%s.%3N)"
elapsed="$(echo "${end}-${start}" | bc)"
printf '%10.3f %s\n' "${elapsed}" "$*"
}
COMMON_ARGS="-Wl,--as-needed"
MOLD_ARGS="-B/path/to/mold-1.4.2-x86_64-linux/libexec/mold"
do_link -fuse-ld=bfd ${COMMON_ARGS}
do_link -fuse-ld=gold ${COMMON_ARGS}
do_link ${MOLD_ARGS} ${COMMON_ARGS}
do_link ${MOLD_ARGS} ${COMMON_ARGS} -Wl,--no-threads
do_link ${MOLD_ARGS} ${COMMON_ARGS} -Wl,--thread-count=2
do_link ${MOLD_ARGS} ${COMMON_ARGS} -Wl,--thread-count=4
do_link ${MOLD_ARGS} ${COMMON_ARGS} -Wl,--thread-count=8
do_link ${MOLD_ARGS} ${COMMON_ARGS} -Wl,--thread-count=16
At Windows 11 WSL2 Ubuntu 22.04
time ./do-test 1000 -fuse-ld=bfd
real 0m19.087s
user 0m16.510s
sys 0m2.869s
time ./do-test 1000 -fuse-ld=gold
real 0m14.950s
user 0m12.782s
sys 0m2.451s
time ./do-test 1000 -B/path/to/mold-1.4.2-x86_64-linux/libexec/mold/
real 0m17.845s
user 0m11.314s
sys 0m1.682s
Linux Ubuntu 22.04 (Sorry nuked the exact numbers for Linux) At native Linux I have the expected gain compared to Gold and just need ~50% of the time to finish compared to Gold. So I think this is the real performance to expect.
My first guess is that multi-threading performance on WSL2 is not as good as the native Linux. Do you mind if I ask you to redo the test with -Wl,-no-threads
to disable multi-threading? It's usually slower than without that flag because multi-threading usually boost its performance, but that may not be the case on WSL2.
I updated the script,
results at WSL
./do-test 1000
10.339 -fuse-ld=bfd -Wl,--as-needed
5.795 -fuse-ld=gold -Wl,--as-needed
9.288 -B/home/le/mold-1.4.2-x86_64-linux/libexec/mold -Wl,--as-needed
5.969 -B/home/le/mold-1.4.2-x86_64-linux/libexec/mold -Wl,--as-needed -Wl,--no-threads
5.933 -B/home/le/mold-1.4.2-x86_64-linux/libexec/mold -Wl,--as-needed -Wl,--thread-count=2
6.393 -B/home/le/mold-1.4.2-x86_64-linux/libexec/mold -Wl,--as-needed -Wl,--thread-count=4
7.320 -B/home/le/mold-1.4.2-x86_64-linux/libexec/mold -Wl,--as-needed -Wl,--thread-count=8
9.279 -B/home/le/mold-1.4.2-x86_64-linux/libexec/mold -Wl,--as-needed -Wl,--thread-count=16
OK, so it looks like there's something with WSL2. It doesn't scale at all but becomes slower when using more than one thread. You may want to report it to Microsoft.