mold icon indicating copy to clipboard operation
mold copied to clipboard

mold-2.40.4 fails to link >=btrfs-progs-6.14

Open zyxhere opened this issue 4 months ago • 3 comments

Steps to reproduce:

wget https://mirrors.edge.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/btrfs-progs-v6.16.tar.xz
tar xf btrfs-progs*
export CFLAGS="-O2 -flto=auto -fuse-ld=mold"
export LDFLAGS=$CFLAGS
cd btrfs-progs*
./configure --disable-lzo --disable-documentation --disable-libudev --disable-zstd

See errors:

gcc -o btrfs-corrupt-block kernel-lib/list_sort.o kernel-lib/raid56.o kernel-lib/rbtree.o kernel-lib/tables.o kernel-shared/accessors.o kernel-shared/async-thread.o kernel-shared/backref.o kernel-shared/ctree.o kernel-shared/delayed-ref.o kernel-shared/dir-item.o kernel-shared/disk-io.o kernel-shared/extent-io-tree.o kernel-shared/extent-tree.o kernel-shared/extent_io.o kernel-shared/file-item.o kernel-shared/file.o kernel-shared/free-space-cache.o kernel-shared/free-space-tree.o kernel-shared/inode-item.o kernel-shared/inode.o kernel-shared/locking.o kernel-shared/messages.o kernel-shared/print-tree.o kernel-shared/root-tree.o kernel-shared/transaction.o kernel-shared/tree-checker.o kernel-shared/ulist.o kernel-shared/uuid-tree.o kernel-shared/volumes.o kernel-shared/zoned.o common/array.o common/compat.o common/cpu-utils.o common/device-scan.o common/device-utils.o common/extent-cache.o common/extent-tree-utils.o common/root-tree-utils.o common/filesystem-utils.o common/format-output.o common/fsfeatures.o common/help.o common/inject-error.o common/messages.o common/open-utils.o common/parse-utils.o common/path-utils.o common/rbtree-utils.o common/send-stream.o common/send-utils.o common/sort-utils.o common/string-table.o common/string-utils.o common/sysfs-utils.o common/task-utils.o common/units.o common/utils.o check/qgroup-verify.o check/repair.o cmds/receive-dump.o crypto/crc32c.o crypto/hash.o crypto/xxhash.o crypto/sha224-256.o crypto/blake2b-ref.o crypto/blake2b-sse2.o crypto/blake2b-sse41.o crypto/blake2b-avx2.o crypto/sha256-x86.o crypto/crc32c-pcl-intel-asm_64.o libbtrfsutil/stubs.o libbtrfsutil/subvolume.o btrfs-corrupt-block.o \
	 \
	libbtrfsutil.a \
	-flto=auto -fuse-ld=mold -O2 -rdynamic -L.   -luuid -lblkid  -L. -pthread  
  LN       libbtrfsutil.so
ln -s -f libbtrfsutil.so.1.3.2 libbtrfsutil.so
  LN       libbtrfsutil.so.1
ln -s -f libbtrfsutil.so.1.3.2 libbtrfsutil.so.1
mold: error: /usr/lib64/libblkid.so: --no-allow-shlib-undefined: undefined symbol: reallocarray
collect2: error: ld returned 1 exit status
make: *** [Makefile:710: btrfs-map-logical] Error 1
make: *** Waiting for unfinished jobs....
mold: error: /usr/lib64/libblkid.so: --no-allow-shlib-undefined: undefined symbol: reallocarray
collect2: error: ld returned 1 exit status
make: *** [Makefile:710: btrfs-corrupt-block] Error 1
mold: error: /usr/lib64/libblkid.so: --no-allow-shlib-undefined: undefined symbol: reallocarray
collect2: error: ld returned 1 exit status
make: *** [Makefile:710: btrfs-find-root] Error 1
mold: error: /usr/lib64/libblkid.so: --no-allow-shlib-undefined: undefined symbol: reallocarray
collect2: error: ld returned 1 exit status
make: *** [Makefile:769: btrfs-image] Error 1
mold: error: /usr/lib64/libblkid.so: --no-allow-shlib-undefined: undefined symbol: reallocarray
collect2: error: ld returned 1 exit status
make: *** [Makefile:710: btrfs-select-super] Error 1
mold: error: /usr/lib64/libblkid.so: --no-allow-shlib-undefined: undefined symbol: reallocarray
collect2: error: ld returned 1 exit status
make: *** [Makefile:753: mkfs.btrfs] Error 1
mold: error: /usr/lib64/libblkid.so: --no-allow-shlib-undefined: undefined symbol: reallocarray
collect2: error: ld returned 1 exit status
make: *** [Makefile:761: btrfstune] Error 1
mold: error: /usr/lib64/libblkid.so: --no-allow-shlib-undefined: undefined symbol: reallocarray
collect2: error: ld returned 1 exit status
make: *** [Makefile:717: btrfs] Error 1

The bfd linker can link fine but not mold. Also passing --enable-libudev to configure gets rid of the errors. gcc version is 14.3

zyxhere avatar Aug 27 '25 10:08 zyxhere

Downstream bug: https://bugs.gentoo.org/961969

zyxhere avatar Aug 27 '25 10:08 zyxhere

Thank you for your report! This issue looks real and actually not limited to LTO. I'll try to address it later this week.

rui314 avatar Aug 28 '25 11:08 rui314

I've been looking into this issue as a learning exercise and thought I'd post some experimental results, in case this information is helpful for others. Please excuse the noise if not!

I found the following changes each individually seemed to workaround this issue (not real fixes though):

  1. delete the reallocarray() (re)definition from libbtrfsutil/stubs.* entirely, and allow libbtrfsutil/subvolume.c to use the C standard library version of reallocarray() instead
  2. annotate the reallocarray() definition in libbtrfsutil/stubs.h with __attribute__((visibility("default")))
  3. annotate the reallocarray() definition in libbtrfsutil/stubs.h with __attribute__((used))
  4. use a modified build of mold with the following lines commented out from src/passes.cc: https://github.com/rui314/mold/blob/91439beea516f1ce51dc504bf475c4d596a84e9a/src/passes.cc#L314-L315

(4) makes me think this could be similar to the issue described in https://github.com/rui314/mold/commit/0612ea41f1a4dad4439e898b1d2a675bfd1e85a2

Assume both `foo.a` and `bar.so` define the same symbol `baz`.
If `baz`'s symbol visibility is hidden, it needs to be resolved within
the output file, i.e., from `foo.a`. However, previously, such symbol
was resolved to the one in `bar.so`.

To fix the problem, we'll lower the symbol priority for DSOs.

Something related might be happening here, with libbtrfsutil.a taking the place of foo.a, libc.so taking the place of bar.so, and reallocarray() taking the place of baz. My current best guess regarding control flow and in-memory state:

  1. libbtrfsutil/stubs.* defines reallocarray()
  2. reallocarray() ends up visible from libbtrfsutil.a
  3. the only caller of this reallocarray() variant is actually libbtrfsutil/subvolume.c
  4. subvolume.c gets compiled to subvolume.o and linked into the very same libbtrfsutil.a -- a seemingly prime candidate for LTO!
  5. get_symbols() in src/lto-unix.cc sets LDPR_PREVAILING_DEF_IRONLY and LDPR_RESOLVED_IR for the definition of and invocation of reallocarray() in stubs.o/subvolume.o within libbtrfsutil.a
  6. based on the output of src/lto-unix.cc, LTO optimizes a lot of this away when linking the btrfs utility (for example) with libbtrfsutil.a, and stubs.o in libbtrfsutil.a becomes unreachable/unused
  7. after run_lto_plugin() in do_lto(), the in-memory Symbol object corresponding to the reallocarray() symbol in libc.so still has visibility == STV_HIDDEN, like the variant of reallocarray() originally from libbtrfsutil.a -- maybe a kind of pollution, since reallocarray() from libc.so would not normally have this visibility value?
  8. do_lto() calls clear_symbols() and resolve_symbols()
  9. visibility == STV_HIDDEN causes resolve_symbols() in src/passes.cc to set skip_dso = true for this Symbol object: https://github.com/rui314/mold/blob/91439beea516f1ce51dc504bf475c4d596a84e9a/src/passes.cc#L314
  10. skip_dso = true causes SharedFile<E>::resolve_symbols() in src/input-files.cc to skip processing this symbol, leaving Symbol::file unset: https://github.com/rui314/mold/blob/91439beea516f1ce51dc504bf475c4d596a84e9a/src/input-files.cc#L1413-L1414
  11. check_shlib_undefined() in src/passes.cc errors out on the null value of Symbol::file: https://github.com/rui314/mold/blob/91439beea516f1ce51dc504bf475c4d596a84e9a/src/passes.cc#L1175-L1178

etwoo avatar Sep 21 '25 14:09 etwoo