mold
mold copied to clipboard
testsuite: armv7l test failure
I noticed the following test failure on the target:
[ 181s] + /usr/bin/make -O -j4 V=1 VERBOSE=1 test -e PREFIX=/usr BINDIR=/usr/bin MANDIR=/usr/share/man LIBDIR=/usr/lib LIBEXECDIR=/usr/libexec STRIP=true SYSTEM_TBB=1 SYSTEM_XXHASH=1 SYSTEM_MIMALLOC=1
[ 182s] /usr/bin/make -C test -f Makefile.linux --no-print-directory --output-sync
[ 182s] Testing absolute-symbols ... skipped
[ 182s] Testing ar-alignment ... collect2: fatal error: ld terminated with signal 7 [Bus error]
I don't know the cause of the failure, but armv7l is a 32-bit ARM processor, and mold isn't tested well on 32-bit hosts. I'll take a look when I have time.
I think I might be experiencing same issue, it happens always on armv7, sometimes on armv6. gdb suggests it's unaligned address here:
https://github.com/rui314/mold/blob/b26e1a3c328b27cd1f573f6804d3281d40cb11e5/filetype.h#L43-L44
@rui314 with mold 1.3.0 and fix for #545 testsuite passes for me on both armv6 (ARM) and armv7 (THUMB). Also made plenty of builds of various open source projects so far on those platforms using mold and all works fine so I think this ticket can be closed.
I've just tried the current master (5688115b) and still see one failing test-case:
[ 719s] Testing mold-wrapper ... ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x401B1F4: _dl_start (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== Address 0xfec36574 is on thread 1's stack
[ 719s] ==6431== 120 bytes below stack pointer
[ 719s] ==6431==
[ 719s] ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x401255C: _dl_setup_hash (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0x401B883: _dl_start (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== Address 0xfec36588 is on thread 1's stack
[ 719s] ==6431== 8 bytes below stack pointer
[ 719s] ==6431==
[ 719s] ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x40197C0: _dl_sysdep_start (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0xFFFFFFFF: ???
[ 719s] ==6431== Address 0xfec36564 is on thread 1's stack
[ 719s] ==6431== 32 bytes below stack pointer
[ 719s] ==6431==
[ 719s] ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x4015A70: __GI___tunables_init (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0xFEC365C7: ???
[ 719s] ==6431== Address 0xfec364ec is on thread 1's stack
[ 719s] ==6431== 104 bytes below stack pointer
[ 719s] ==6431==
[ 719s] ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x400C268: _dl_strtoul (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0xFFFFFFFF: ???
[ 719s] ==6431== Address 0xfec364b4 is on thread 1's stack
[ 719s] ==6431== 56 bytes below stack pointer
[ 719s] ==6431==
[ 719s] ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x4012734: _dl_sort_maps_init (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0xFFFFFFFF: ???
[ 719s] ==6431== Address 0xfec36564 is on thread 1's stack
[ 719s] ==6431== 16 bytes below stack pointer
[ 719s] ==6431==
[ 719s] ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x401FD44: sbrk (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0x4019993: _dl_sysdep_start (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0x401B8C7: _dl_start (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== Address 0xfec36558 is on thread 1's stack
[ 719s] ==6431== 16 bytes below stack pointer
[ 719s] ==6431==
[ 719s] ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x401BD00: dl_main (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== Address 0xfec3634c is on thread 1's stack
[ 719s] ==6431== 520 bytes below stack pointer
[ 719s] ==6431==
[ 719s] ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x400C4FC: _dl_new_object (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0xFFFFFFFF: ???
[ 719s] ==6431== Address 0xfec3631c is on thread 1's stack
[ 719s] ==6431== 48 bytes below stack pointer
[ 719s] ==6431==
[ 719s] ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x400BFC8: __minimal_calloc (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0x400C55F: _dl_new_object (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0x401C6A3: dl_main (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== Address 0xfec36320 is on thread 1's stack
[ 719s] ==6431== 16 bytes below stack pointer
[ 719s] ==6431==
[ 719s] ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x400BE6C: __minimal_malloc (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0xFFFFFFFF: ???
[ 719s] ==6431== Address 0xfec3631c is on thread 1's stack
[ 719s] ==6431== 24 bytes below stack pointer
[ 719s] ==6431==
[ 719s] ==6431== Invalid write of size 4
[ 719s] ==6431== at 0x400C420: _dl_add_to_namespace_list (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== by 0x401C6CF: dl_main (in /usr/lib/ld-linux-armhf.so.3)
[ 719s] ==6431== Address 0xfec36350 is on thread 1's stack
[ 719s] ==6431== 16 bytes below stack pointer
[ 719s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x401255C: _dl_setup_hash (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x401CBF3: dl_main (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec36360 is on thread 1's stack
[ 720s] ==6431== 8 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x4019E60: _dl_discover_osversion (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0xFEC3681D: ???
[ 720s] ==6431== Address 0xfec3615c is on thread 1's stack
[ 720s] ==6431== 496 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x40070D0: _dl_init_paths (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec36314 is on thread 1's stack
[ 720s] ==6431== 56 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x4018448: _dl_important_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec362ac is on thread 1's stack
[ 720s] ==6431== 104 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x4018F24: _dl_hwcaps_split_masked (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x40184FB: _dl_important_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec362c0 is on thread 1's stack
[ 720s] ==6431== 8 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x4018F24: _dl_hwcaps_split_masked (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x4018543: _dl_important_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec362c0 is on thread 1's stack
[ 720s] ==6431== 8 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x4018E04: _dl_hwcaps_split (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x4018F33: _dl_hwcaps_split_masked (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x4018543: _dl_important_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec362a8 is on thread 1's stack
[ 720s] ==6431== 16 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x400BE6C: __minimal_malloc (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x403B0CF: ??? (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec362ac is on thread 1's stack
[ 720s] ==6431== 24 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x4018F24: _dl_hwcaps_split_masked (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x4018643: _dl_important_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec362c0 is on thread 1's stack
[ 720s] ==6431== 8 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x4018E04: _dl_hwcaps_split (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x4018F33: _dl_hwcaps_split_masked (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x4018643: _dl_important_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec362a8 is on thread 1's stack
[ 720s] ==6431== 16 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x400BE6C: __minimal_malloc (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x403BDCB: ??? (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec3628c is on thread 1's stack
[ 720s] ==6431== 24 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x401838C: copy_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0xFEC36FF9: ???
[ 720s] ==6431== Address 0xfec36274 is on thread 1's stack
[ 720s] ==6431== 40 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x4018F24: _dl_hwcaps_split_masked (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x4018423: copy_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x40189CF: _dl_important_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec36278 is on thread 1's stack
[ 720s] ==6431== 8 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x4018F24: _dl_hwcaps_split_masked (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x4018423: copy_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x40189E7: _dl_important_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec36278 is on thread 1's stack
[ 720s] ==6431== 8 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x4018E04: _dl_hwcaps_split (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x4018F33: _dl_hwcaps_split_masked (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x4018423: copy_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x40189E7: _dl_important_hwcaps (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec36260 is on thread 1's stack
[ 720s] ==6431== 16 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x40166BC: _dl_audit_activity_map (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== by 0x401D55F: dl_main (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec36320 is on thread 1's stack
[ 720s] ==6431== 40 bytes below stack pointer
[ 720s] ==6431==
[ 720s] ==6431== Invalid write of size 4
[ 720s] ==6431== at 0x401BBE0: handle_preload_list (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== Address 0xfec3534c is not stack'd, malloc'd or (recently) free'd
[ 720s] ==6431==
[ 720s] ==6431==
[ 720s] ==6431== Process terminating with default action of signal 11 (SIGSEGV)
[ 720s] ==6431== Access not within mapped region at address 0xFEC3534C
[ 720s] ==6431== at 0x401BBE0: handle_preload_list (in /usr/lib/ld-linux-armhf.so.3)
[ 720s] ==6431== If you believe this happened as a result of a stack
[ 720s] ==6431== overflow in your program's main thread (unlikely but
[ 720s] ==6431== possible), you can try to increase the size of the
[ 720s] ==6431== main thread stack using the --main-stacksize= flag.
[ 720s] ==6431== The main thread stack size used in this run was 8388608.
[ 720s] make[1]: *** [Makefile.linux:6: elf/mold-wrapper.sh] Error 1
I cannot reproduce it even with -fstack-protector. How did you build mold?
I'm attaching build log from openSUSE OBS: armv7_log.txt
Thanks. It looks like the following test is also failing:
[ 930s] Testing exception ... ./elf/exception.sh: line 28: 4638 Illegal instruction $QEMU $t/exe
I still cannot reproduce on my emulated ARM32 machine running Ubuntu 22.04. Can you share your build directory to me?
@marxin what is uname -m evaluated to during build? Isn't this a mismatch between uname -m being aarch64 and toolchain being armv7?
Well, it's using KVM inside an aarch64 machine, but it should be fine, we use the setup of the entire openSUSE distribution.
Anyway, there are out/test/* files:
https://splichal.eu/tmp/armv7l.tar.zst
My point is test framework seems to assume MACHINE (defaults to uname -m) matches toolchain and also some tests are conditional based on MACHINE:
https://github.com/rui314/mold/blob/7f7d10cc896e0c7513d09723d5ca3d8346bdea49/test/elf/exception.sh#L39-L42
That's why I pass MACHINE explicitly.
Ah, I see. Well, it's fine in my case:
[ 146s] + uname -m
[ 146s] armv7l
Does this patch change make any difference?
diff --git a/elf/input-files.cc b/elf/input-files.cc
index 668161011..ae002ea9c 100644
--- a/elf/input-files.cc
+++ b/elf/input-files.cc
@@ -165,22 +165,22 @@ void ObjectFile<E>::initialize_sections(Context<E> &ctx) {
break;
}
case SHT_SYMTAB_SHNDX:
symtab_shndx_sec = this->template get_data<u32>(ctx, shdr);
break;
case SHT_SYMTAB:
case SHT_STRTAB:
case SHT_REL:
case SHT_RELA:
case SHT_NULL:
- case SHT_ARM_ATTRIBUTES:
break;
+ case SHT_ARM_ATTRIBUTES:
default: {
std::string_view name = this->shstrtab.data() + shdr.sh_name;
// .note.GNU-stack section controls executable-ness of the stack
// area in GNU linkers. We ignore that section because silently
// making the stack area executable is too dangerous. Tell our
// users about the difference if that matters.
if (name == ".note.GNU-stack") {
if (shdr.sh_flags & SHF_EXECINSTR) {
if (!ctx.arg.z_execstack && !ctx.arg.z_execstack_if_needed)
It's the same (for being sure attaching the build log): arm.txt
I don't know how to debug this further without setting up the same environment as yours, so let me do that. That's OpenSUSE/ARMv7 right?
Yes, it's openSUSE:Factory and the target is armv7l.
Is there any progress on this, please?
No progress, sorry. I'm not familiar with ARM64 and moreover ARM32 on ARM64, so I haven't figure out how to set up a test environment.
Thanks. It looks like the following test is also failing:
[ 930s] Testing exception ... ./elf/exception.sh: line 28: 4638 Illegal instruction $QEMU $t/exeI still cannot reproduce on my emulated ARM32 machine running Ubuntu 22.04. Can you share your build directory to me?
I can reproduce this crash on a Raspberry Pi 2 running (or rather crawling 🐌) Fedora 36. It's sufficient to take a trivial test program and link it with mold:
$ echo 'int main(void) { return 0; }' | cc -xc - -o exe -static -fuse-ld=mold
$ ./exe
Illegal instruction (core dumped)
The stack trace reveals that the crash happens within the libc startup code:
(gdb) bt
#0 0x0023f940 in strlen ()
#1 0x00217de4 in getenv ()
#2 0x0025bbc4 in _dl_non_dynamic_init ()
#3 0x00203310 in __libc_init_first ()
#4 0x0020361c in __libc_start_main_impl ()
#5 0x00201078 in _start ()
Analysis
Note that I am by no means an expert on the ARM processor architecture, so my findings are somewhat based on conjecture.
I believe the core issue is that the implementation of strlen() uses the Thumb instruction set, but Thumb mode (bit 5) is disabled:
(gdb) printf "0x%x\n", $cpsr
0x20000010
^
this should be 3, not 1
strlen() is called by the following instruction:
00217d9c <getenv>:
...
217de0: eb009ed6 bl 23f940 <strlen>
...
Compare this to the equivalent binary produced by the GNU BFD linker:
00016634 <getenv>:
...
16678: fa004400 blx 27680 <strlen>
...
The blx instruction stands for Branch with link, and exchange instruction set.
So it seems very likely that mold does not apply the relocation for this call correctly.
In /usr/lib/libc.a, I see a bl instruction:
00000000 <getenv>:
...
44: ebfffffe bl 0 <strlen>
...
... and an R_ARM_CALL relocation:
File: /usr/lib/libc.a(getenv.o)
Relocation section '.rel.text' at offset 0x24c contains 4 entries:
Offset Info Type Sym.Value Sym. Name
00000044 0000041c R_ARM_CALL 00000000 strlen
00000088 0000051c R_ARM_CALL 00000000 strncmp
000000ec 00000619 R_ARM_BASE_PREL 00000000 _GLOBAL_OFFSET_TABLE_
000000f0 0000071a R_ARM_GOT_BREL 00000000 __environ
When dealing with R_ARM_CALL relocations, mold does this:
https://github.com/rui314/mold/blob/a8d1e293fe9d04efc77f0905d6029d707b349993/elf/arch-arm32.cc#L201
... while lld does this:
https://github.com/llvm/llvm-project/blob/164266739298b39d3eac8d79ad12d1d654e2825e/lld/ELF/Arch/ARM.cpp#L527
Possibly related: #468
@sicherha Thank you for your investigation! I think I can fix this issue now with it.
Great, I can confirm that all tests except one work on armv7l openSUSE builder.
The last remaining one is the valgrind test I already mentioned here. Maybe a subject for skipping on ARM32?
[ 647s] Testing mold-wrapper ... ==6223== Invalid write of size 4
[ 647s] ==6223== at 0x401B1F4: _dl_start (in /usr/lib/ld-linux-armhf.so.3)
[ 647s] ==6223== Address 0xfee12574 is on thread 1's stack
[ 647s] ==6223== 120 bytes below stack pointer
[ 647s] ==6223==
[ 647s] ==6223== Invalid write of size 4
[ 647s] ==6223== at 0x401255C: _dl_setup_hash (in /usr/lib/ld-linux-armhf.so.3)
[ 647s] ==6223== by 0x401B883: _dl_start (in /usr/lib/ld-linux-armhf.so.3)
[ 647s] ==6223== Address 0xfee12588 is on thread 1's stack
[ 647s] ==6223== 8 bytes below stack pointer
[ 647s] ==6223==
...
It is odd that the test should have already been skipped if ASAN is enabled. We checked if __asan_init or __tsan_init is defined to determine if ASAN/TSAN is in use. Could you run nm mold to see if such symbol is defined? ASAN on ARM32 may be using a slightly different symbol names than those.
The ASAN check doesn't apply here because the messages come from Valgrind, which is called in line 90 of the script. I'll see if I can find out where this stack-pointer confusion comes from.
Yeah but we have this at line 19, so the entire script should have been skipped.
nm mold | grep -Eq '[at]san_init' && { echo skipped; exit; }
I believe in this instance mold was built without ASAN enabled, so the precondition checks do not cause the remainder of the script to be skipped. So we end up in line 90, where Valgrind is executed and finds invalid writes beyond the stack.
However, I cannot reproduce @marxin's finding with mold 1.4.0 and Fedora 36 on a Raspberry Pi 2: in my case, Valgrind does not report any errors. This is with Valgrind 3.19.0 and glibc 2.35.
@sicherha Ah, you are right. I missed that point.
It doesn't seems that the additional valgrind test add much value, and this is the only place we use valgrind. I think we can just remove these lines altogether.
The reason Valgrind was introduced in the first place with commit 3694ffe61d1d33236b2b7f0be6f5ac7fc2f7a4f1 was #495. Since it has been long fixed, I believe the mold-wrapper code is trivial enough that we don't need Valgrind as a safety net any more.
Agreed.