mrustc
mrustc copied to clipboard
Threading primitives are not aligned properly
> /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.54.0/build_libc-0_2_95_H19_run
Process was terminated with signal 7
FAILING COMMAND: /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.54.0/build_libc-0_2_95_H19_run make: *** [minicargo.mk:106: output-1.54.0/libtest.rlib] Error 1
Program received signal SIGBUS, Bus error.
0x00007ffff7eaea9c in __pthread_rwlock_rdlock_full64 (abstime=0x0, clockid=0,
rwlock=0x1000ef929 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g+1>) at pthread_rwlock_common.c:353
353 pthread_rwlock_common.c: No such file or directory.
(gdb) bt
#0 0x00007ffff7eaea9c in __pthread_rwlock_rdlock_full64 (abstime=0x0, clockid=0,
rwlock=0x1000ef929 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g+1>) at pthread_rwlock_common.c:353
#1 __GI___pthread_rwlock_rdlock (rwlock=0x1000ef929 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g+1>)
at pthread_rwlock_rdlock.c:27
#2 0x000000010006a140 in ZRIG4ch3std50_0_03sys4unix6rwlock6RWLock0g4read0g (
arg0=0x1000ef928 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g>) at output-1.54.0/libstd.rlib.c:199887
#3 0x0000000100086b1c in ZRG4ch3std50_0_03sys4unix2os6getenv0g (arg0=...) at output-1.54.0/libstd.rlib.c:104244
#4 0x0000000100086cc0 in ZRG2ch3std50_0_03env7_var_os0g (arg0=...) at output-1.54.0/libstd.rlib.c:88484
#5 0x000000010000e4ac in ZRG2ch3std50_0_03env6var_os1gBsCy (arg0=...)
at output-1.54.0/build_libc-0_2_95_H19_run.c:4149
#6 0x000000010000bf20 in ZRG1c019rustc_minor_nightly0g () at output-1.54.0/build_libc-0_2_95_H19_run.c:3029
#7 0x000000010000a7fc in ZRG1c04main0g () at output-1.54.0/build_libc-0_2_95_H19_run.c:2624
#8 0x00000001000157ac in ZRQG2ch3std50_0_02rth7closure12lang_start_01gT03ch4core50_0_03ops8function2Fn1gT04call0g (arg0=0x7fffffffe690, arg1=...) at output-1.54.0/build_libc-0_2_95_H19_run.c:8031
#9 0x0000000100062ee0 in ZRG3ch3std50_0_09panickingh0117do_call2gBsD3ch4core50_0_03ops8function2Fn1gT01Cf22cb46marker4Sync0g2cb05panic13RefUnwindSafe0gCf (arg0=0x7fffffffe320 "\220\346\377\377\377\177")
at output-1.54.0/libstd.rlib.c:97274
#10 0x000000010006d09c in ZRG2ch3std50_0_09panicking3try2gCfBsD3ch4core50_0_03ops8function2Fn1gT01Cf22cb36marker4Sync0g2cb05panic13RefUnwindSafe0g (arg0=...) at output-1.54.0/libstd.rlib.c:98407
#11 0x0000000100080a10 in ZRG2ch3std50_0_02rt19lang_start_internal0g (arg0=..., arg1=<optimized out>,
arg2=<optimized out>) at output-1.54.0/libstd.rlib.c:98907
#12 0x000000010000e5a8 in ZRG2ch3std50_0_02rt10lang_start0g (arg0=0x10000a5d4 <ZRG1c04main0g>, arg1=1,
arg2=0x7fffffffeb18) at output-1.54.0/build_libc-0_2_95_H19_run.c:4169
#13 0x0000000100015d10 in main (argc=1, argv=0x7fffffffeb18) at output-1.54.0/build_libc-0_2_95_H19_run.c:8239
This is probably PPC specific (as x86-64 works).
I have no idea what could trigger a bus error, the argument appears correct.
A bit of googling says that it is probably bad alignment on the lock (SIGBUS can come from misaligned accesses). Likely the alignment config is wrong, OR something is doing an atomic operation with bad alignment (not captured in the normal type alignment logic).
Forcing -O0 in codegen_c.cpp "fixes" this issue. Re-applying -O0 -fsection-anchors reproduces it. Simply appending -fno-section-anchors by itself reproduces the bug as well (EDIT: this is a typo - I forgot to run this test before posting - will edit again with correct result).
Since this is part of -O1 (and even -Og!) it is expected to never break any well-defined code, which suggests either mrustc is depending on undefined behaviour somewhere, or GCC [11.2.0 in this case] has a bug (IMO much less likely).
Considering that this is inside ZRG2ch3std50_0_02rt19lang_start_internal0g, is it possible the rwlock hasn't been properly initialised yet?
(Alignments appear correct, and I would expect the problem to exist with -O0 if it were an alignment issue)
libstd uses PTHREAD_RWLOCK_INITIALIZER: https://github.com/rust-lang/rust/blob/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/sys/unix/rwlock.rs#L16-L22
But when is that assignment made?
ENV_LOCK is a static, so the result of RwLock::new() ends up in the .data section: https://github.com/rust-lang/rust/blob/1.54.0/library/std/src/sys/unix/os.rs#L490 Note that StaticRwLock is a wrapper around the RwLock I pointed to above: https://github.com/rust-lang/rust/blob/1.54.0/library/std/src/sys_common/rwlock.rs#L9
Observing that pthread_rwlock_t should have a valid alignment for both unsigned int and long int and __GI___pthread_rwlock_rdlock is being called with rwlock=0x1000ef929, which is not aligned for either. The alignments defined in mrustc are:
const TargetArch ARCH_POWERPC64LE = {
"powerpc64",
64, false,
{ /*atomic(u8)=*/true, true, true, true, true },
TargetArch::Alignments(2, 4, 8, 16, 4, 8, 8)
};
So why isn't it being aligned?
FWIW, the actual results of merely adding -fno-section-anchors to the compile options was that it's now failing on __GI___pthread_mutex_lock instead, suggesting that mutexes are similarly mis-aligned... :/
X86 doesn't care about alignment as much as other architectures, so it wouldn't be too surprising to me if this alignment issue exists on all platforms.
Seems like -O0 working was just a coincidence :(
Check the libstd.rlib.c file for usages of ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g
Looking closer at the above backtrace, it's passed correctly to ZRIG4ch3std50_0_03sys4unix6rwlock6RWLock0g4read0g but not to __GI___pthread_rwlock_rdlock - so look at the generated source for ZRIG4ch3std50_0_03sys4unix6rwlock6RWLock0g4read0g
Not sure how I got the original backtrace - having trouble getting one with the arguments not optimised out at this point (since -O0 makes reproducing it hard). :/
Currently looking at
#0 0x00007ffff7b8ae68 in __GI___pthread_mutex_lock (
mutex=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at ../nptl/pthread_mutex_lock.c:80
#1 0x00007ffff7ddcc18 in pthread_mutex_lock (mutex=<optimized out>) at forward.c:117
#2 0x0000000100144f2c in ZRIG4ch3std100_0_0_H3003sys4unix5mutex5Mutex0g4lock0g (
arg0=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at output-1.39.0/libstd.rlib.c:174145
#3 0x0000000100154594 in ZRG4ch3std100_0_0_H3003sys4unix2os6getenv0g (arg0=...)
at output-1.39.0/libstd.rlib.c:93748
#4 0x0000000100154a38 in ZRG2ch3std100_0_0_H3003env7_var_os0g (arg0=...) at output-1.39.0/libstd.rlib.c:81068
#5 0x0000000100010ff8 in ZRG2ch3std100_0_0_H3003env6var_os1gBsCy (arg0=...)
at output-1.39.0/build_libc-0_2_62_Hd_run.c:5212
#6 0x000000010001d4e4 in ZRG1c019rustc_minor_version0g () at output-1.39.0/build_libc-0_2_62_Hd_run.c:3257
#7 0x000000010001e2c4 in ZRG1c04main0g () at output-1.39.0/build_libc-0_2_62_Hd_run.c:2951
#8 0x0000000100010288 in ZRQG2ch3std100_0_0_H3002rth7closure12lang_start_01gT03ch4core50_0_03ops8function2Fn1gT04call0g (arg0=0x7fffffffe6b0, arg1=...) at output-1.39.0/build_libc-0_2_62_Hd_run.c:10139
#9 0x00000001000dbe04 in ZRG3ch3std100_0_0_H3009panickingh0127do_call2gG2cb02rth7closure21lang_start_internal_10gCf (arg0=0x7fffffffe450 "\260\346\377\377\377\177") at output-1.39.0/libstd.rlib.c:87492
#10 0x00000001000aea20 in __rust_maybe_catch_panic (
arg0=0x1000dbd34 <ZRG3ch3std100_0_0_H3009panickingh0127do_call2gG2cb02rth7closure21lang_start_internal_10gCf>, arg1=0x7fffffffe450 "\260\346\377\377\377\177", arg2=0x7fffffffe348, arg3=0x7fffffffe350)
at output-1.39.0/libpanic_abort.rlib.c:138
#11 0x0000000100148cb0 in ZRG2ch3std100_0_0_H3009panicking3try2gCfG2cb02rth7closure21lang_start_internal_10g (
arg0=...) at output-1.39.0/libstd.rlib.c:88298
#12 0x000000010015623c in ZRG2ch3std100_0_0_H3002rt19lang_start_internal0g (arg0=..., arg1=1,
arg2=0x7fffffffeb38) at output-1.39.0/libstd.rlib.c:88945
#13 0x00000001000110f4 in ZRG2ch3std100_0_0_H3002rt10lang_start0g (arg0=0x10001e1d8 <ZRG1c04main0g>, arg1=1,
arg2=0x7fffffffeb38) at output-1.39.0/build_libc-0_2_62_Hd_run.c:5232
#14 0x000000010001f438 in main (argc=1, argv=0x7fffffffeb38) at output-1.39.0/build_libc-0_2_62_Hd_run.c:10463
The very first mutex-related call has the misaligned address:
Breakpoint 1, ZRIG4ch3std100_0_0_H3003sys4unix5mutex5Mutex0g4lock0g (
arg0=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at output-1.39.0/libstd.rlib.c:174116
174116 {
(gdb)
Continuing.
Breakpoint 2, 0x00007ffff7ddcbd8 in pthread_mutex_lock (
mutex=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at forward.c:117
117 forward.c: No such file or directory.
(gdb)
Continuing.
Breakpoint 2, 0x00007ffff7b8ae00 in __GI___pthread_mutex_lock (
mutex=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at ../nptl/pthread_mutex_lock.c:64
64 ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb)
Continuing.
Program received signal SIGBUS, Bus error.
Testing locally shows that they are aligned properly (well, an assert shows that they're correct). This might be a quirk of the PPC target? or just the compiler version? Can you confirm that the lock type is annotated with the align attribute?
printf("%d\n", (int)alignof(pthread_mutex_t));
tells me 8
I would expect if this were a system problem, the issue would affect a lot more than just mrustc...?
I meant the mrustc-emitted type (within libstd.rlib.c). From my local checks, it's correctly aligned to 8 bytes (EDIT: Was checking 1.29, not 1.39 - 1.39 is wrong)
I don't know what I am looking for then
The above commit included an assertion that the alignment is correct (at least, the C matches the value expected by mrustc).
Look for the definition of s_ZRG4ch4libc90_2_62_Hd4unix10linux_like5linux15pthread_mutex_t0g in libstd.rlib.c
Locally (x86-64 linux), this has an alignment of 1, ... which seems wrong - might be a failure in handling repr(align(N))
// struct ::"libc-0_2_62_Hd"::unix::linux_like::linux::pthread_mutex_t
struct s_ZRG4ch4libc90_2_62_Hd4unix10linux_like5linux15pthread_mutex_t0g {
t_ZRTA40Ca _0; // [u8; 40]
} ;
Confirmed - repr(align(N)) isn't handled properly (it's just being ignored in HIR lowering).
I'm working on a fix (slowly), but it's breaking parts of the MSVC build.
Potential fix in the above commit, waiting for it to regression test on linux
Seems good @luke-jr Mind checking with your PPC64 branch?
@luke-jr Reminder: Can you confirm that the above fix works for you?
@luke-jr Still looking for confirmation. Alignment should be properly supported now, but I'd like to confirm before closing the issue.
Neither ffb0961ad3a1125ca7db3b6a48991023bcb4d0d7 (w/ patches) nor current master work for me, apparently due to issues unrelated to alignment (but it has been far too long to confirm the alignment-related issue is fixed or not).
16d1d29 was the commit that originally addressed this issue. However, surprising that current master fails.
Yes, I was including patching 16d1d29 into ffb0961 of course. :)
master is generating code that looks like (IIRC) (int128_t)-ll missing the constant number somehow
specifically in output-1.39.0/libcore.rlib.c
aed9b3665c39f594f4cf8c46afe5048655a72c77 is the first bad commit (for the -ll thing)
Trying 6f42b7415dc6c91ba41f8afb7d9e807e7c139c38, output/cargo fails with
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `1`,
right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59
Process was terminated with signal 6
FAILING COMMAND: /var/tmp/portage/dev-lang/rust-1.40.0_p20220113/work/rustc-1.40.0-src/mrustc/output/cargo-build/build_miniz-sys-0_1_11_run thread 'main' panicked at 'assertion failed: `(left == right)`
left: `1`,
right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `1`,
right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59
Looking at the mentioned assert line, it sounds like it may still be an alignment-related issue :/
Can you identify which structure is badly aligned?
The panic happens at https://github.com/rust-lang/hashbrown/blob/e7cd4a57a2690f199a527434f635206363ad661f/src/raw/mod.rs#L1086 which asserts that a "control bytes" pointer is aligned to the "group width". If on x86 this may be __m128i not being aligned correctly: https://github.com/rust-lang/hashbrown/blob/e7cd4a57a2690f199a527434f635206363ad661f/src/raw/sse2.rs#L34 It needs to be aligned to at least 128 bits. If on another target it may be u32 not being aligned to 4 bytes or u64 to 8 bytes.
the sse2 file shouldn't be used (target_feature = "sse2" shoudn't pass) so that can't be it.
@luke-jr Can you confirm that the issue still exists? If it does, are you able to identify the cause (i.e. what structure is unaligned, or other error).
I have a very fuzzy memory of seeing a similar error when a constant expression wasn't properly converted to a static.
As of f08a7cb0641fed5410b4c92f3e4446413c897d2f trying to build 1.39.0:
...
(17/112) BUILDING cc v1.0.35
> /mnt/hd2019c-nobackup/dev/rust/mrustc/bin/mrustc rustc-1.39.0-src/vendor/cc/src/lib.rs -o output-1.39.0/rustc-build/libcc-1_0_35.rlib --crate-name cc --crate-type rlib -C emit-depfile=output-1.39.0/rustc-build/libcc-1_0_35.rlib.d --crate-tag 1_0_35 -g --cfg debug_assertions -O -L output-1.39.0 -L output-1.39.0/rustc-build
> /mnt/hd2019c-nobackup/dev/rust/mrustc/bin/mrustc rustc-1.39.0-src/src/librustc_llvm/build.rs --crate-name build --crate-type bin -o output-1.39.0/rustc-build/build_rustc_llvm_run -L output-1.39.0/rustc-build -g -L output-1.39.0 --extern build_helper=output-1.39.0/rustc-build/libbuild_helper-0_1_0.rlib --extern cc=output-1.39.0/rustc-build/libcc-1_0_35.rlib --edition 2018
> /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm_run
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `1`,
right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59
Process was terminated with signal 6
FAILING COMMAND: /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm_run
Calling /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm_run failed (see /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm.txt_failed.txt for stdout)
BUILD FAILED
make: *** [minicargo.mk:223: output-1.39.0/rustc] Error 1
I don't know how I would diagnose it further?
A backtrace from that panic would help, and maybe a some reading of the generated code to see if you can see where that ctrl value is coming from (and thus why it's not properly aligned)
Also: what platform are you on? I can't seem to reproduce this failure on mint 20.3 (gcc 9.3.0-17ubuntu1~20.04)
I'm running into this issue as well (powerpc64le using guix to build rust). Is anyone still working on this or is a workaround known? How can I help?
@mrvdb It should have been fixed with the improvements to alignment handling... but if it's still crashing, you can help by identifying the misaligned type - and the correct alignment for it.