mold icon indicating copy to clipboard operation
mold copied to clipboard

Support DT_RELR using standard -z pack-relative-relocs cmdline flag

Open rawoul opened this issue 3 years ago • 3 comments

In bfd and gold, packed dynamic relocations are toggled with the -z pack-relative-relocs and -z nopack-relative-relocs cmd line flags, while mold uses -pack-dyn-relocs=none|relr. For compatibility it would be simpler if mold aligned to the other linkers, can the standard flag also be supported, or used instead of -pack-dyn-relocs ?

rawoul avatar Aug 21 '22 12:08 rawoul

If someone really has a trouble with these flags, I'll consider, but I don't think we should add a new flag for a hypothetical use. It's somewhat an evolutionary process; if a flag doesn't become popular enough, it won't be supported by many implementations, and its use will diminish. So I don't think we should try to support all flags beforehand.

rui314 avatar Aug 21 '22 13:08 rui314

When adding -z pack-relative-relocs bfd also adds a version dependency on GLIBC_ABI_DT_RELR if libc.so is in DT_NEEDED and there's a dependency on version GLIBC_2.* (meaning the object depends on glibc). Without this the glibc dynamic loader will refuse to load a shared object using DT_RELR with the following error:

DT_RELR without GLIBC_ABI_DT_RELR dependency

That seems quite complicated, so I agree with you that waiting is probably better... For context, the same issue has been raised before in the lld project: https://github.com/llvm/llvm-project/issues/53775.

rawoul avatar Aug 22 '22 00:08 rawoul

Thank you for sharing the link! I agree with you that what GNU ld does seems quite complicated.

rui314 avatar Aug 22 '22 02:08 rui314

b365d6ba1aade2e99d636892e4a21b8d7988a01c added -z pack-relative-relocs without GLIBC_ABI_DT_RELR, which leads to the error mentioned in https://github.com/rui314/mold/issues/653#issuecomment-1221664798 ...

glandium avatar Aug 08 '23 20:08 glandium

I was experiencing the error mentioned in https://github.com/rui314/mold/issues/653#issuecomment-1221664798 in the 2.0.0 release build when building firefox 116. This issue appears to have been fixed by https://github.com/rui314/mold/commit/f467ad1add2ab6e381e0e458f026df197e63d487, but a new error pops up:

20:10.56 mold: fatal: /tmp/lto-llvm-5abe96.o: REL-type relocation table is not supported for this target
20:10.58 clang-15: error: linker command failed with exit code 1 (use -v to see invocation)
20:10.59 make[4]: *** [/compile/makepkg/librewolf/src/librewolf-116.0-1/config/rules.mk:532: libmozsqlite3.so] Error 1
20:10.59 make[3]: *** [/compile/makepkg/librewolf/src/librewolf-116.0-1/config/recurse.mk:72: config/external/sqlite/target] Error 2

Could this possibly be related?

PS: here is a link to a report on the same bug on firefox's issue tracker: https://bugzilla.mozilla.org/show_bug.cgi?id=1847697.

maxyvisser avatar Aug 10 '23 08:08 maxyvisser

No, that's different. What target did you get this error for? If it's for x86-64 or ARM64, it is actually odd if an object file contains a REL-type relocation section instead of RELA-type one.

rui314 avatar Aug 10 '23 09:08 rui314

This is on x86-64, a ryzen 5 5600G to be exact. Do you want me to create a new issue, since it is unrelated?

maxyvisser avatar Aug 10 '23 10:08 maxyvisser

Can reproduce this issue on FF-117 as well

171:11.18 mold: fatal: /var/tmp/portage/www-client/firefox-117.0/temp/lto-llvm-715b02.o: REL-type relocation table is not supported for this target

ambasta avatar Aug 30 '23 09:08 ambasta

It looks like LLVM generates REL-type relocations instead of RELA-type ones for LLVM. It's very likely that it's a bug in LLVM. Can you report it to LLVM?

rui314 avatar Aug 30 '23 10:08 rui314

Sure, in the process of rebuilding manually (w/o ebuild) to isolate and confirm the issue, Will file one once I have confirmation

ambasta avatar Aug 30 '23 10:08 ambasta

Is this really a bug w/ LLVM?

Compiling w/ LLVM succeeds

clang version 17.0.2
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/17/bin
Configuration file: /etc/clang/x86_64-pc-linux-gnu-clang.cfg
System configuration file directory: /etc/clang
Selected GCC installation: /usr/lib/gcc/x86_64-pc-linux-gnu/13
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/usr/bin/x86_64-pc-linux-gnu-ld.bfd" --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -shared -o libmozsqlite3.so /usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64/crti.o /usr/lib/gcc/x86_64-pc-linux-gnu/13/crtbeginS.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/13 -L/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../x86_64-pc-linux-gnu/lib -L/lib -L/usr/lib -plugin /usr/lib/llvm/17/bin/../lib64/LLVMgold.so -plugin-opt=mcpu=znver1 -plugin-opt=O3 -plugin-opt=thinlto -plugin-opt=-function-sections=1 -plugin-opt=-data-sections=1 -z relro -z defs --gc-sections -h libmozsqlite3.so /var/tmp/portage/www-client/firefox-118.0.2/work/firefox_build/config/external/sqlite/libmozsqlite3_so.list -plugin-opt=-import-instr-limit=10 -plugin-opt=-import-hot-multiplier=30 -lpthread -O1 --as-needed --compress-debug-sections=zlib -rpath=/usr/lib64/firefox --enable-new-dtags -z pack-relative-relocs -z noexecstack -z text -z relro -z nocopyreloc -Bsymbolic-functions -rpath-link /var/tmp/portage/www-client/firefox-118.0.2/work/firefox_build/dist/bin -rpath-link /usr/lib --version-script libmozsqlite3.so.symbols -lm -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-pc-linux-gnu/13/crtendS.o /usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64/crtn.o

But on adding -fuse-ld=mold

clang version 17.0.2
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/17/bin
Configuration file: /etc/clang/x86_64-pc-linux-gnu-clang.cfg
System configuration file directory: /etc/clang
Selected GCC installation: /usr/lib/gcc/x86_64-pc-linux-gnu/13
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/usr/bin/ld.mold" --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -shared -o libmozsqlite3.so /usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64/crti.o /usr/lib/gcc/x86_64-pc-linux-gnu/13/crtbeginS.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/13 -L/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../x86_64-pc-linux-gnu/lib -L/lib -L/usr/lib -plugin /usr/lib/llvm/17/bin/../lib64/LLVMgold.so -plugin-opt=mcpu=znver1 -plugin-opt=O3 -plugin-opt=thinlto -plugin-opt=-function-sections=1 -plugin-opt=-data-sections=1 -z relro -z defs --gc-sections -h libmozsqlite3.so /var/tmp/portage/www-client/firefox-118.0.2/work/firefox_build/config/external/sqlite/libmozsqlite3_so.list -plugin-opt=-import-instr-limit=10 -plugin-opt=-import-hot-multiplier=30 -lpthread -O1 --as-needed --compress-debug-sections=zlib -rpath=/usr/lib64/firefox --enable-new-dtags -z pack-relative-relocs -z noexecstack -z text -z relro -z nocopyreloc -Bsymbolic-functions -rpath-link /var/tmp/portage/www-client/firefox-118.0.2/work/firefox_build/dist/bin -rpath-link /usr/lib --version-script libmozsqlite3.so.symbols -lm -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/x86_64-pc-linux-gnu/13/crtendS.o /usr/lib/gcc/x86_64-pc-linux-gnu/13/../../../../lib64/crtn.o
mold: fatal: /tmp/lto-llvm-be7713.o: REL-type relocation table is not supported for this target

ambasta avatar Oct 13 '23 09:10 ambasta

Yes, it's a bug in LLVM, please report it to LLVM. x86-64 is defined to use RELA-type relocations by its psABI, so creating a REL-type relocation type is illegal even if other linkers happen to not complain.

rui314 avatar Oct 13 '23 10:10 rui314

@rui314 Some additional guidance please, looking at https://github.com/llvm/llvm-project/blob/631e2911ea298bc12564df8acd16bba89790426a/lld/test/ELF/relocation-none.test#L38

## Both REL and RELA are supported. .rel.llvm.call-graph-profile uses REL even
## if the prevailing format is RELA.

This is exactly where mold is failing, and seems to be documented behavior

ambasta avatar Oct 28 '23 07:10 ambasta

Even if it's documented, it still violates the spec. As quoted from x86-64 psABI p.72, "The AMD64 LP64 ABI architecture uses only Elf64_Rela relocation entries with explicit addends." While they have the option not to conform to the standard, doing so would lead to compatibility issues for obvious reasons.

rui314 avatar Oct 28 '23 09:10 rui314

Yes, but that would be unlikely. Even the kernel has patches to ignore modpost checks on call-graph-profile

https://lore.kernel.org/lkml/CAK7LNARVi1sfBjv5a5OoQWPEeM-6bFuwPJE+i32NC=wdum-AKw@mail.gmail.com/t/#mcd82d7d511dcb4fe8939e21127fafda08f4f732e

And here's the rationale behind it

https://reviews.llvm.org/D104080

ambasta avatar Oct 28 '23 09:10 ambasta

Alright, how about we simply ignore the REL relocation tables in psABIs where RELA is mandated? It seems that by doing so, we'll resolve the issue.

rui314 avatar Oct 28 '23 09:10 rui314

I genuinely do not understand any of this enough to comment. This issue was my first attempt at even trying to go through a compiler/linker source, so I'm not sure what the correct approach is here.

ambasta avatar Oct 28 '23 09:10 ambasta

I believe ignoring a relocation table with an incompatible type should be fine because in this case all relocations in the incompatible table is R_*_NONE type, which is a no-op relocation. mold used to silently ignore incompatible relocation tables, so it's not new.

rui314 avatar Oct 28 '23 09:10 rui314