mold
mold copied to clipboard
spurious duplicate symbol error
spurious linker error when linking with libboost_fiber.a. with plain gcc it doesn't happen.
g++ -Wall -Wextra -g -fPIC -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fno-omit-frame-pointer -Wno-unused-parameter -march=haswell -std=c++20 -DHAS_RAWMEMCHR -fdiagnostics-color=always -B/usr/local/libexec/mold -O3 -DNDEBUG -flto bunch-of-libs....
mold: error: duplicate symbol: /usr/lib/x86_64-linux-gnu/libboost_fiber.a(condition_variable.o): /tmp/ccFj8wOo.ltrans1.ltrans.o: guard variable for boost::fibers::detail::spinlock_ttas::lock()::generator
mold: error: duplicate symbol: /usr/lib/x86_64-linux-gnu/libboost_fiber.a(condition_variable.o): /tmp/ccFj8wOo.ltrans1.ltrans.o: boost::fibers::detail::spinlock_ttas::lock()::generator
mold 1.2.1 (c8d8f86a52084c96e2663d9f692c51e98c04cc2f; compatible with GNU ld)
What program are you trying to build? And what distro? I want to reproduce the issue on my machine.
I run it on ubuntu 22.04 but I think it will have the same effect on 20.04.
The repo to reproduce: https://github.com/romange/helio
build prerequisites: apt install -y cmake libunwind-dev zip libfl-dev bison ninja-build autoconf-archive libtool libboost-fiber-dev libssl-dev
to configure:
./blaze.sh -release -DBoost_USE_STATIC_LIBS=ON -DUSE_MOLD=ON
to build: cd build-opt && ninja echo_server
Getting something similar when I enable LTO with 1.2.1 (no issues with 1.0.3 that falls back to ld.bfd for LTO) :
mold: error: duplicate symbol: /nix/store/77gp5dn6n4vxaadnwmrysjaclxq70m5k-boost-1.78.0/lib/libboost_log-mt-x64.a(attribute_name.o): /run/user/1014/ccwEzPfE.ltrans0.ltrans.o: boost::system::detail::generic_cat_holder<void>::instance
I can reproduce this. Let me take a look.
What I know so far: this happens for TLS usage in statically linked libraries and probably has to do with some GNU-related behavior.
One of the .a symbols look like this:
29: 0000000000000000 8 TLS UNIQUE HIDDEN 33 guard variable for boost::fibers::detail::spinlock_ttas::lock()::generator
31: 0000000000000000 8 TLS UNIQUE HIDDEN 34 boost::fibers::detail::spinlock_ttas::lock()::generator
While the lto_trans symbols look like this:
438: 0000000000000020 8 TLS GLOBAL DEFAULT 21 guard variable for boost::fibers::detail::spinlock_ttas::lock()::generator
440: 0000000000000018 8 TLS GLOBAL DEFAULT 21 boost::fibers::detail::spinlock_ttas::lock()::generator
The problem should go away if we treat these UNIQUE symbols as WEAK, but there are varying advice on whether STB_GNU_UNIQUE should be treated as GLOBAL or WEAK though, and looking at LLD there's no specific handling for UNIQUE there, so I'm not sure that would be the correct fix.
GNU unique symbols have a weird semantics, and it's no longer used actively. So it's odd that Ubuntu 22.04 distributes an object file containing GNU unique symbols. That object file must have been built with a misconfigured compiler or with the -fgnu-unique option for whatever reason. This has to be fixed on the distro's side. Could you investigate a bit more about how this package was built and report the problem to Ubuntu? I think lld also can't handle this library file.
Could you investigate a bit more about how this package was built and report the problem to Ubuntu?
I looked at the same file on Arch and it also has UNIQUE. Maybe Boost is setting such a flag?
I think lld also can't handle this library file.
It seems like it can link it without error.
Could you investigate a bit more about how this package was built and report the problem to Ubuntu?
I looked at the same file on Arch and it also has UNIQUE. Maybe Boost is setting such a flag?
Possibly.
I think lld also can't handle this library file.
It seems like it can link it without error.
That's odd too. Does lld handle GNU unique symbols as weak symbols?
I looked at the same file on Arch and it also has UNIQUE. Maybe Boost is setting such a flag?
Possibly.
I'll see if it has the flag in the build configuration.
FWIW, the Boost I used was built directly from source (the repro required a certain version).
That's odd too. Does lld handle GNU unique symbols as weak symbols?
It does not.
https://github.com/llvm/llvm-project/blob/c02abb68cd88a9edbc5d4dd83de6f50766fb9ae8/lld/ELF/Symbols.cpp#L601-L606
I do wonder if it eliminates such symbols before checking for duplicates, though.
Also, not sure but should the visibility of HIDDEN prevent it from being duplicate-detected? I guess it's not because lld doesn't have that in condition.
The HIDDEN visibility doesn't (and IIUC shouldn't) affect how symbol conflicts are resolved in this case, so I don't think it prevented a symbol duplicate error for these symbols.
I'm very sorry that I made some mistakes during debugging, and ld.bfd was actually used when I said I used lld. I guess it's possibly some bfd specific behavior related to STB_GNU_UNIQUE handling, but a quick grep didn't show how bfd handles it differently yet.
As for -fgnu-unique, looks like it's still the default. Tested on both Arch and Ubuntu.
$ gcc --version
gcc (GCC) 12.1.0
...
$ gcc -Q --help=common | rg unique
-fgnu-unique [enabled]
If GNU unique is enabled by default and gcc emits GNU unique symbols aggressively, the symbols we are discussing about should have been consistently of type GNU unique. It's odd that one object file contains the symbols as GNU unique, and another file has them as non-GNU unique symbols. What caused that symbol's inconsistency?
Also, if GNU unique is disabled, I believe gcc emits symbols as weak defined symbols instead of regular strongly defined symbols. But here we get global symbols instead of weak ones. I don't know what that means, but it looks odd.
For the reference, this is what GCC docs say:
-fno-gnu-unique On systems with recent GNU assembler and C library, the C++ compiler uses the STB_GNU_UNIQUE binding to make sure that definitions of template static data members and static local variables in inline functions are unique even in the presence of RTLD_LOCAL; this is necessary to avoid problems with a library used by two different RTLD_LOCAL plugins depending on a definition in one of them and therefore disagreeing with the other one about the binding of the symbol. But this causes dlclose to be ignored for affected DSOs; if your program relies on reinitialization of a DSO via dlclose and dlopen, you can use -fno-gnu-unique.
Used it(-fno-gnu-unique) to rebuild boost and now I can link succesfully with mold (1.2.1) having LTO enabled
OK, I figured out whatever nefarious logic gold was using:
https://github.com/bminor/binutils-gdb/blob/c8eab1d7c92ad72089c98e5753ebc96419e3674a/gold/resolve.cc#L99-L101
Apparently the GCC linker plugin will emit most of the ltrans symbol as GLOBAL even if they were declared LOCAL in the original object. The fix would be to transfer the st_bind derived from PluginSymbol's def over to the LTO replacement input files.
@rui314 If I understand correctly, it seems that mold's approach to symbol resolution is a little bit different from gold's. Is it true that mold just throws the stub symbols away before resolution so we don't have the same "replace LTO stub symbols with the ltrans file" process? If so, do you think it's a sensible approach to solve this would be to have a separate pass that copies over the st_bind values?
It feels like a bug of the GCC linker plugin. Do you mind if I ask you to file a bug against GCC? Or you can fix it yourself and send a patch to them if you want to.
As to how to fix the issue, I think it's sensible to copy st_bind values from symbols in IR files to the resolved symbols as the last step of LTO.
Do you mind if I ask you to file a bug against GCC?
Sounds good, will do.
As to how to fix the issue, I think it's sensible to copy st_bind values from symbols in IR files to the resolved symbols as the last step of LTO.
Thanks for confirming. I'll work on a fix.
After looking into this again, it turned out that the cause is rather different: GCC LTO plugin does not emit the object file with correct st_bind and st_visibility attributes, and it relies on a gold behavior where 1. it reuses the symbol resolution information done on IR files, and 2. allows overriding IR symbols with LTO symbols unconditionally.
I filed this as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105933.
The strategy for us is largely unchanged: we just copy over the attributes we got from the plugin. I don't think it's worth constructing another symbol hash table though, so I'm just looking over the symbols and see if's currently resolved to an IR symbol. If it is, then we get the information from that; if it isn't, then we assume that the LTO symbol can't win the resolution and drop the symbol instead. WIP branch is here.
There is still another issue that seems to be hitting us though: GCC LTO plugin doesn't emit COMDAT sections, and we're doing conflict resolution using the fallback path with WEAK etc. In this case, the WEAK symbol from the object file overrides the UNIQUE symbol from the archive, because object files not in an archive/so always wins. But then in the duplicate resolution it complains, because while a GLOBAL symbol is allowed to override a WEAK symbol, the vice versa is not.
So the question here is:
- Which should win,
WEAKfrom object files orUNIQUEfrom static library? - Should we whitelist such cases in duplicate resolution?
I cannot answer to the question regarding the WEAK symbol without experiments. The current symbol strength rules is chosen based on a large scale experiment (by compiling 10,000 Gentoo packages), and changing the rule may result in an unexpected failure in other package. It may still be worth it, but I cannot predict the outcome without actually trying it.
I think there's another way to fix the issue. Currently, we redo symbol resolution from scratch after LTO, but we don't need to do that. Instead of discarding all symbols and redo symbol resolution, we can override IR symbols with compiled LTO symbols. This is actually what lld does. Since LTO can introduce a new undefined symbol, we need to handle the case that a new object file is pulled out from an archive though. I don't know if it's a good idea, but it looks like it's one way to solve this issue.
I think there's another way to fix the issue. Currently, we redo symbol resolution from scratch after LTO, but we don't need to do that. Instead of discarding all symbols and redo symbol resolution, we can override IR symbols with compiled LTO symbols. This is actually what lld does. Since LTO can introduce a new undefined symbol, we need to handle the case that a new object file is pulled out from an archive though. I don't know if it's a good idea, but it looks like it's one way to solve this issue.
Sorry for the delay. On second thought, I'm not sure if this actually solves the issue. In the branch I linked above, I'm already copying as much information from the IR symbols over to the LTO symbols which effectively ensures it results in the same resolution. However, even if we did not throw away the IR symbols, it will likely get the same duplicate symbol error later in the process because 1. GCC doesn't include COMDAT information in IR objects nor LTO objects and 2. our duplication resolution seems to not like cases with a WEAK from object file and UNIQUE from static library. So really here we need to teach GCC to emit COMDAT or get the duplicate checker to not complain about this case, I think.
Sorry, this causes so much trouble 👐🏼
So really here we need to teach GCC to emit COMDAT or get the duplicate checker to not complain about this case, I think.
How difficult it is to make mold not to complain about symbol duplication error? We don't need to de-duplicate COMDAT groups (our UNIQUE symbol handling isn't perfect, and having redundant copies of function code in the output is acceptable). We just suppress the symbol duplication error for this case.
It's... tricky. The duplicate checker is written in a manner that assumes COMMON symbols always overrides WEAK symbols. Under such assumption, you only need to check the top 2 matches, and if both are COMMON, then there's a duplicate.
But here we're overriding a UNIQUE symbol with an object file's WEAK symbol. If we allow WEAK as the top match and UNIQUE (in staticlib) as the second match, then there might be a third UNIQUE match in staticlib that we didn't check that is an actual duplicate. That said, I need to do more research on whether this is considered an actual issue (I don't understand why staticlib symbols are demoted right now).
was able to reproduce in another way:
$ cat test.cpp
#include <boost/system/error_code.hpp>
$ g++ -O2 -flto -c test.cpp
$ g++ -O2 -c -o test2.o test.cpp
$ g++ test.o test2.o
mold: error: duplicate symbol: test2.o: /tmp/ccrOQCsM.ltrans0.ltrans.o: boost::system::detail::is_generic_value(int)::gen
mold: error: duplicate symbol: test2.o: /tmp/ccrOQCsM.ltrans0.ltrans.o: boost::system::detail::cat_holder<void>::generic_category_instance
mold: error: duplicate symbol: test2.o: /tmp/ccrOQCsM.ltrans0.ltrans.o: boost::system::detail::cat_holder<void>::system_category_instance
mold: error: duplicate symbol: test2.o: /tmp/ccrOQCsM.ltrans0.ltrans.o: guard variable for boost::system::detail::to_std_category(boost::system::error_category const&)::map_
mold: error: duplicate symbol: test2.o: /tmp/ccrOQCsM.ltrans0.ltrans.o: boost::system::detail::to_std_category(boost::system::error_category const&)::map_mx_
mold: error: duplicate symbol: test2.o: /tmp/ccrOQCsM.ltrans0.ltrans.o: boost::system::detail::to_std_category(boost::system::error_category const&)::map_
mold: error: duplicate symbol: test2.o: /tmp/ccrOQCsM.ltrans0.ltrans.o: guard variable for boost::system::detail::to_std_category(boost::system::error_category const&)::generic_instance
mold: error: duplicate symbol: test2.o: /tmp/ccrOQCsM.ltrans0.ltrans.o: boost::system::detail::to_std_category(boost::system::error_category const&)::generic_instance
mold: error: duplicate symbol: test2.o: /tmp/ccrOQCsM.ltrans0.ltrans.o: guard variable for boost::system::detail::to_std_category(boost::system::error_category const&)::system_instance
mold: error: duplicate symbol: test2.o: /tmp/ccrOQCsM.ltrans0.ltrans.o: boost::system::detail::to_std_category(boost::system::error_category const&)::system_instance
collect2: error: ld returned 1 exit status
this -flto asymmetricity looks weird but it can easily happen when CMake pybind11_add_module is used.
interestingly, g++ test2.o test.o does not raise the error.
pybind11_add_module : https://github.com/pybind/pybind11/blob/v2.9.2/tools/pybind11Tools.cmake#L195
Debian 11 bullseye, stock boost 1.74
An update.
The archive UNIQUE vs object WEAK problem I mentioned actually has a clear answer: if the archive file is reached from one of the objects (in old terms, extracted), then it's just treated as a normal object file. There's no ambiguity here; the UNIQUE one wins.
More problematic is the hack we're using to deal when resolving UNIQUE symbols. We treat all UNIQUE symbols as WEAK in terms of resolution priority: https://github.com/rui314/mold/blob/a8559be1b194ed63114747e7648287e9ea741200/elf/input-files.cc#L828 But this breaks the soundness of the resolution system. As a result, a WEAK symbol (with higher file priority) could become the file resolution in presence of a non-eliminated UNIQUE symbol, and that's exactly where the false-positive duplicate is coming from, after my prototype patch to copy st_bind from the IR objects. This did not happen in non-LTO, because everything was guarded with a COMDAT, but now with this edge case it becomes a trouble.
I've been thinking about doing COMDAT elimination before symbol resolution. That way we can avoid this soundness-breaking hack altogether. Although it doesn't look like just reordering the passes works; in particular, mergeable sections gets initialized as is_alive=false and that breaks _IO_stdin_used in Scrt1.o etc. If I initialize it as is_alive=true it breaks a bunch of other things, obviously.
(The following diff is what I have been doing as a part of proposal above. It doesn't fix this issue (524). Also it doesn't cover the case LTO yet.)
diff --git a/elf/input-files.cc b/elf/input-files.cc
index 63cfd844..5c9c08ea 100644
--- a/elf/input-files.cc
+++ b/elf/input-files.cc
@@ -652,7 +652,7 @@ void ObjectFile<E>::initialize_mergeable_sections(Context<E> &ctx) {
isec->sh_size && isec->shdr().sh_entsize &&
isec->relsec_idx == -1) {
mergeable_sections[i] = split_section(ctx, *isec);
- isec->is_alive = false;
+ // isec->is_alive = false;
}
}
}
@@ -825,7 +825,7 @@ static u64 get_rank(InputFile<E> *file, const ElfSym<E> &esym, bool is_lazy) {
//
// It looks like STB_GNU_UNIQUE is not a popular option anymore and
// often disabled by default though.
- bool is_weak = (esym.st_bind == STB_WEAK || esym.st_bind == STB_GNU_UNIQUE);
+ bool is_weak = esym.st_bind == STB_WEAK;
if (file->is_dso || is_lazy) {
if (is_weak)
@@ -896,7 +896,7 @@ void ObjectFile<E>::resolve_symbols(Context<E> &ctx) {
InputSection<E> *isec = nullptr;
if (!esym.is_abs() && !esym.is_common()) {
isec = get_section(esym);
- if (!isec)
+ if (!isec || !isec->is_alive)
continue;
}
diff --git a/elf/main.cc b/elf/main.cc
index f604c94a..798d4196 100644
--- a/elf/main.cc
+++ b/elf/main.cc
@@ -459,6 +459,9 @@ static int elf_main(int argc, char **argv) {
// Apply -exclude-libs
apply_exclude_libs(ctx);
+ // Remove redundant comdat sections (e.g. duplicate inline functions).
+ eliminate_comdats(ctx);
+
// Resolve symbols and fix the set of object files that are
// included to the final output.
resolve_symbols(ctx);
@@ -466,9 +469,6 @@ static int elf_main(int argc, char **argv) {
// Resolve mergeable section pieces to merge them.
register_section_pieces(ctx);
- // Remove redundant comdat sections (e.g. duplicate inline functions).
- eliminate_comdats(ctx);
-
// Create .bss sections for common symbols.
convert_common_symbols(ctx);
I've been thinking about doing COMDAT elimination before symbol resolution.
That idea hasn't occurred to me, but it sounds very interesting. The more I think about it, it more feels like it's the right way to handle comdats and the symbol resolution.
Cool, I'll experiment more with this and try to get the interaction with mergable sections resolved.
I'm also seeing this with a (sadly closed source) linking against libstdc++fs (perhaps transitively we pull it in twice). ld and gold work, but we get:
mold: error: duplicate symbol: /envy/wave/a5/cb288f5c6ce485/lib/gcc/x86_64-conda-linux-gnu/12.1.0/libstdc++fs.a(dir.o): /tmp/cc006KP8.ltrans3.ltrans.o: std::_Sp_make_shared_tag::_S_ti()::__tag
If I manually drop the -lstdc++fs from our cmd line (incorrectly, as we do actually use it directly), the link completes. This only happens with LTO builds.
Is there some debugging information we can supply to disambiguate this error from this bug (ie I don't know if this is a separate issue or not).
Edit: adding versions:
x86_64-conda_cos6-linux-gnu-g++ (conda-forge gcc 12.1.0-16) 12.1.0
Copyright (C) 2022 Free Software Foundation, Inc.
mold 1.6.0 (compatible with GNU ld)
Can you try with the above commit?