mrustc icon indicating copy to clipboard operation
mrustc copied to clipboard

Threading primitives are not aligned properly

Open luke-jr opened this issue 4 years ago • 37 comments
trafficstars

> /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.54.0/build_libc-0_2_95_H19_run
Process was terminated with signal 7
FAILING COMMAND:  /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.54.0/build_libc-0_2_95_H19_run make: *** [minicargo.mk:106: output-1.54.0/libtest.rlib] Error 1
Program received signal SIGBUS, Bus error.
0x00007ffff7eaea9c in __pthread_rwlock_rdlock_full64 (abstime=0x0, clockid=0, 
    rwlock=0x1000ef929 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g+1>) at pthread_rwlock_common.c:353
353     pthread_rwlock_common.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7eaea9c in __pthread_rwlock_rdlock_full64 (abstime=0x0, clockid=0, 
    rwlock=0x1000ef929 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g+1>) at pthread_rwlock_common.c:353
#1  __GI___pthread_rwlock_rdlock (rwlock=0x1000ef929 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g+1>)
    at pthread_rwlock_rdlock.c:27
#2  0x000000010006a140 in ZRIG4ch3std50_0_03sys4unix6rwlock6RWLock0g4read0g (
    arg0=0x1000ef928 <ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g>) at output-1.54.0/libstd.rlib.c:199887
#3  0x0000000100086b1c in ZRG4ch3std50_0_03sys4unix2os6getenv0g (arg0=...) at output-1.54.0/libstd.rlib.c:104244
#4  0x0000000100086cc0 in ZRG2ch3std50_0_03env7_var_os0g (arg0=...) at output-1.54.0/libstd.rlib.c:88484
#5  0x000000010000e4ac in ZRG2ch3std50_0_03env6var_os1gBsCy (arg0=...)
    at output-1.54.0/build_libc-0_2_95_H19_run.c:4149
#6  0x000000010000bf20 in ZRG1c019rustc_minor_nightly0g () at output-1.54.0/build_libc-0_2_95_H19_run.c:3029
#7  0x000000010000a7fc in ZRG1c04main0g () at output-1.54.0/build_libc-0_2_95_H19_run.c:2624
#8  0x00000001000157ac in ZRQG2ch3std50_0_02rth7closure12lang_start_01gT03ch4core50_0_03ops8function2Fn1gT04call0g (arg0=0x7fffffffe690, arg1=...) at output-1.54.0/build_libc-0_2_95_H19_run.c:8031
#9  0x0000000100062ee0 in ZRG3ch3std50_0_09panickingh0117do_call2gBsD3ch4core50_0_03ops8function2Fn1gT01Cf22cb46marker4Sync0g2cb05panic13RefUnwindSafe0gCf (arg0=0x7fffffffe320 "\220\346\377\377\377\177")
    at output-1.54.0/libstd.rlib.c:97274
#10 0x000000010006d09c in ZRG2ch3std50_0_09panicking3try2gCfBsD3ch4core50_0_03ops8function2Fn1gT01Cf22cb36marker4Sync0g2cb05panic13RefUnwindSafe0g (arg0=...) at output-1.54.0/libstd.rlib.c:98407
#11 0x0000000100080a10 in ZRG2ch3std50_0_02rt19lang_start_internal0g (arg0=..., arg1=<optimized out>, 
    arg2=<optimized out>) at output-1.54.0/libstd.rlib.c:98907
#12 0x000000010000e5a8 in ZRG2ch3std50_0_02rt10lang_start0g (arg0=0x10000a5d4 <ZRG1c04main0g>, arg1=1, 
    arg2=0x7fffffffeb18) at output-1.54.0/build_libc-0_2_95_H19_run.c:4169
#13 0x0000000100015d10 in main (argc=1, argv=0x7fffffffeb18) at output-1.54.0/build_libc-0_2_95_H19_run.c:8239

luke-jr avatar Oct 30 '21 16:10 luke-jr

This is probably PPC specific (as x86-64 works).

I have no idea what could trigger a bus error, the argument appears correct.

thepowersgang avatar Oct 31 '21 02:10 thepowersgang

A bit of googling says that it is probably bad alignment on the lock (SIGBUS can come from misaligned accesses). Likely the alignment config is wrong, OR something is doing an atomic operation with bad alignment (not captured in the normal type alignment logic).

thepowersgang avatar Oct 31 '21 13:10 thepowersgang

Forcing -O0 in codegen_c.cpp "fixes" this issue. Re-applying -O0 -fsection-anchors reproduces it. Simply appending -fno-section-anchors by itself reproduces the bug as well (EDIT: this is a typo - I forgot to run this test before posting - will edit again with correct result).

Since this is part of -O1 (and even -Og!) it is expected to never break any well-defined code, which suggests either mrustc is depending on undefined behaviour somewhere, or GCC [11.2.0 in this case] has a bug (IMO much less likely).

Considering that this is inside ZRG2ch3std50_0_02rt19lang_start_internal0g, is it possible the rwlock hasn't been properly initialised yet?

(Alignments appear correct, and I would expect the problem to exist with -O0 if it were an alignment issue)

luke-jr avatar Oct 31 '21 18:10 luke-jr

libstd uses PTHREAD_RWLOCK_INITIALIZER: https://github.com/rust-lang/rust/blob/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/sys/unix/rwlock.rs#L16-L22

bjorn3 avatar Oct 31 '21 18:10 bjorn3

But when is that assignment made?

luke-jr avatar Oct 31 '21 18:10 luke-jr

ENV_LOCK is a static, so the result of RwLock::new() ends up in the .data section: https://github.com/rust-lang/rust/blob/1.54.0/library/std/src/sys/unix/os.rs#L490 Note that StaticRwLock is a wrapper around the RwLock I pointed to above: https://github.com/rust-lang/rust/blob/1.54.0/library/std/src/sys_common/rwlock.rs#L9

bjorn3 avatar Oct 31 '21 18:10 bjorn3

Observing that pthread_rwlock_t should have a valid alignment for both unsigned int and long int and __GI___pthread_rwlock_rdlock is being called with rwlock=0x1000ef929, which is not aligned for either. The alignments defined in mrustc are:

const TargetArch ARCH_POWERPC64LE = {
    "powerpc64",
    64, false,
    { /*atomic(u8)=*/true, true, true, true,  true },
    TargetArch::Alignments(2, 4, 8, 16, 4, 8, 8)
};

So why isn't it being aligned?

FWIW, the actual results of merely adding -fno-section-anchors to the compile options was that it's now failing on __GI___pthread_mutex_lock instead, suggesting that mutexes are similarly mis-aligned... :/

luke-jr avatar Oct 31 '21 18:10 luke-jr

X86 doesn't care about alignment as much as other architectures, so it wouldn't be too surprising to me if this alignment issue exists on all platforms.

bjorn3 avatar Oct 31 '21 18:10 bjorn3

Seems like -O0 working was just a coincidence :(

luke-jr avatar Oct 31 '21 19:10 luke-jr

Check the libstd.rlib.c file for usages of ZRG4ch3std50_0_03sys4unix2os8ENV_LOCK0g Looking closer at the above backtrace, it's passed correctly to ZRIG4ch3std50_0_03sys4unix6rwlock6RWLock0g4read0g but not to __GI___pthread_rwlock_rdlock - so look at the generated source for ZRIG4ch3std50_0_03sys4unix6rwlock6RWLock0g4read0g

thepowersgang avatar Oct 31 '21 23:10 thepowersgang

Not sure how I got the original backtrace - having trouble getting one with the arguments not optimised out at this point (since -O0 makes reproducing it hard). :/

Currently looking at

#0  0x00007ffff7b8ae68 in __GI___pthread_mutex_lock (
    mutex=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at ../nptl/pthread_mutex_lock.c:80
#1  0x00007ffff7ddcc18 in pthread_mutex_lock (mutex=<optimized out>) at forward.c:117
#2  0x0000000100144f2c in ZRIG4ch3std100_0_0_H3003sys4unix5mutex5Mutex0g4lock0g (
    arg0=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at output-1.39.0/libstd.rlib.c:174145
#3  0x0000000100154594 in ZRG4ch3std100_0_0_H3003sys4unix2os6getenv0g (arg0=...)
    at output-1.39.0/libstd.rlib.c:93748
#4  0x0000000100154a38 in ZRG2ch3std100_0_0_H3003env7_var_os0g (arg0=...) at output-1.39.0/libstd.rlib.c:81068
#5  0x0000000100010ff8 in ZRG2ch3std100_0_0_H3003env6var_os1gBsCy (arg0=...)
    at output-1.39.0/build_libc-0_2_62_Hd_run.c:5212
#6  0x000000010001d4e4 in ZRG1c019rustc_minor_version0g () at output-1.39.0/build_libc-0_2_62_Hd_run.c:3257
#7  0x000000010001e2c4 in ZRG1c04main0g () at output-1.39.0/build_libc-0_2_62_Hd_run.c:2951
#8  0x0000000100010288 in ZRQG2ch3std100_0_0_H3002rth7closure12lang_start_01gT03ch4core50_0_03ops8function2Fn1gT04call0g (arg0=0x7fffffffe6b0, arg1=...) at output-1.39.0/build_libc-0_2_62_Hd_run.c:10139
#9  0x00000001000dbe04 in ZRG3ch3std100_0_0_H3009panickingh0127do_call2gG2cb02rth7closure21lang_start_internal_10gCf (arg0=0x7fffffffe450 "\260\346\377\377\377\177") at output-1.39.0/libstd.rlib.c:87492
#10 0x00000001000aea20 in __rust_maybe_catch_panic (
    arg0=0x1000dbd34 <ZRG3ch3std100_0_0_H3009panickingh0127do_call2gG2cb02rth7closure21lang_start_internal_10gCf>, arg1=0x7fffffffe450 "\260\346\377\377\377\177", arg2=0x7fffffffe348, arg3=0x7fffffffe350)
    at output-1.39.0/libpanic_abort.rlib.c:138
#11 0x0000000100148cb0 in ZRG2ch3std100_0_0_H3009panicking3try2gCfG2cb02rth7closure21lang_start_internal_10g (
    arg0=...) at output-1.39.0/libstd.rlib.c:88298
#12 0x000000010015623c in ZRG2ch3std100_0_0_H3002rt19lang_start_internal0g (arg0=..., arg1=1, 
    arg2=0x7fffffffeb38) at output-1.39.0/libstd.rlib.c:88945
#13 0x00000001000110f4 in ZRG2ch3std100_0_0_H3002rt10lang_start0g (arg0=0x10001e1d8 <ZRG1c04main0g>, arg1=1, 
    arg2=0x7fffffffeb38) at output-1.39.0/build_libc-0_2_62_Hd_run.c:5232
#14 0x000000010001f438 in main (argc=1, argv=0x7fffffffeb38) at output-1.39.0/build_libc-0_2_62_Hd_run.c:10463

The very first mutex-related call has the misaligned address:

Breakpoint 1, ZRIG4ch3std100_0_0_H3003sys4unix5mutex5Mutex0g4lock0g (
    arg0=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at output-1.39.0/libstd.rlib.c:174116
174116  {
(gdb) 
Continuing.

Breakpoint 2, 0x00007ffff7ddcbd8 in pthread_mutex_lock (
    mutex=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at forward.c:117
117     forward.c: No such file or directory.
(gdb) 
Continuing.

Breakpoint 2, 0x00007ffff7b8ae00 in __GI___pthread_mutex_lock (
    mutex=0x1001d9562 <ZRG5ch3std100_0_0_H3003sys4unix2osh02138ENV_LOCK0g>) at ../nptl/pthread_mutex_lock.c:64
64      ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) 
Continuing.

Program received signal SIGBUS, Bus error.

luke-jr avatar Nov 01 '21 17:11 luke-jr

Testing locally shows that they are aligned properly (well, an assert shows that they're correct). This might be a quirk of the PPC target? or just the compiler version? Can you confirm that the lock type is annotated with the align attribute?

thepowersgang avatar Nov 06 '21 08:11 thepowersgang

printf("%d\n", (int)alignof(pthread_mutex_t));

tells me 8

I would expect if this were a system problem, the issue would affect a lot more than just mrustc...?

luke-jr avatar Nov 06 '21 16:11 luke-jr

I meant the mrustc-emitted type (within libstd.rlib.c). From my local checks, it's correctly aligned to 8 bytes (EDIT: Was checking 1.29, not 1.39 - 1.39 is wrong)

thepowersgang avatar Nov 06 '21 17:11 thepowersgang

I don't know what I am looking for then

luke-jr avatar Nov 06 '21 17:11 luke-jr

The above commit included an assertion that the alignment is correct (at least, the C matches the value expected by mrustc). Look for the definition of s_ZRG4ch4libc90_2_62_Hd4unix10linux_like5linux15pthread_mutex_t0g in libstd.rlib.c

Locally (x86-64 linux), this has an alignment of 1, ... which seems wrong - might be a failure in handling repr(align(N))

thepowersgang avatar Nov 07 '21 09:11 thepowersgang

// struct ::"libc-0_2_62_Hd"::unix::linux_like::linux::pthread_mutex_t
struct s_ZRG4ch4libc90_2_62_Hd4unix10linux_like5linux15pthread_mutex_t0g  {
        t_ZRTA40Ca _0; // [u8; 40]
} ;

luke-jr avatar Nov 07 '21 15:11 luke-jr

Confirmed - repr(align(N)) isn't handled properly (it's just being ignored in HIR lowering). I'm working on a fix (slowly), but it's breaking parts of the MSVC build.

thepowersgang avatar Nov 08 '21 11:11 thepowersgang

Potential fix in the above commit, waiting for it to regression test on linux

thepowersgang avatar Nov 10 '21 13:11 thepowersgang

Seems good @luke-jr Mind checking with your PPC64 branch?

thepowersgang avatar Nov 13 '21 04:11 thepowersgang

@luke-jr Reminder: Can you confirm that the above fix works for you?

thepowersgang avatar Dec 06 '21 12:12 thepowersgang

@luke-jr Still looking for confirmation. Alignment should be properly supported now, but I'd like to confirm before closing the issue.

thepowersgang avatar Jan 01 '22 05:01 thepowersgang

Neither ffb0961ad3a1125ca7db3b6a48991023bcb4d0d7 (w/ patches) nor current master work for me, apparently due to issues unrelated to alignment (but it has been far too long to confirm the alignment-related issue is fixed or not).

luke-jr avatar Jan 23 '22 07:01 luke-jr

16d1d29 was the commit that originally addressed this issue. However, surprising that current master fails.

thepowersgang avatar Jan 23 '22 07:01 thepowersgang

Yes, I was including patching 16d1d29 into ffb0961 of course. :)

master is generating code that looks like (IIRC) (int128_t)-ll missing the constant number somehow

luke-jr avatar Jan 23 '22 07:01 luke-jr

specifically in output-1.39.0/libcore.rlib.c

luke-jr avatar Jan 23 '22 07:01 luke-jr

aed9b3665c39f594f4cf8c46afe5048655a72c77 is the first bad commit (for the -ll thing)

luke-jr avatar Jan 23 '22 07:01 luke-jr

Trying 6f42b7415dc6c91ba41f8afb7d9e807e7c139c38, output/cargo fails with

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59
Process was terminated with signal 6
FAILING COMMAND:  /var/tmp/portage/dev-lang/rust-1.40.0_p20220113/work/rustc-1.40.0-src/mrustc/output/cargo-build/build_miniz-sys-0_1_11_run thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59

Looking at the mentioned assert line, it sounds like it may still be an alignment-related issue :/

luke-jr avatar Jan 23 '22 20:01 luke-jr

Can you identify which structure is badly aligned?

thepowersgang avatar Jan 27 '22 15:01 thepowersgang

The panic happens at https://github.com/rust-lang/hashbrown/blob/e7cd4a57a2690f199a527434f635206363ad661f/src/raw/mod.rs#L1086 which asserts that a "control bytes" pointer is aligned to the "group width". If on x86 this may be __m128i not being aligned correctly: https://github.com/rust-lang/hashbrown/blob/e7cd4a57a2690f199a527434f635206363ad661f/src/raw/sse2.rs#L34 It needs to be aligned to at least 128 bits. If on another target it may be u32 not being aligned to 4 bytes or u64 to 8 bytes.

bjorn3 avatar Jan 27 '22 16:01 bjorn3

the sse2 file shouldn't be used (target_feature = "sse2" shoudn't pass) so that can't be it.

thepowersgang avatar Jan 28 '22 10:01 thepowersgang

@luke-jr Can you confirm that the issue still exists? If it does, are you able to identify the cause (i.e. what structure is unaligned, or other error).

I have a very fuzzy memory of seeing a similar error when a constant expression wasn't properly converted to a static.

thepowersgang avatar Feb 21 '22 12:02 thepowersgang

As of f08a7cb0641fed5410b4c92f3e4446413c897d2f trying to build 1.39.0:

...
(17/112) BUILDING cc v1.0.35
> /mnt/hd2019c-nobackup/dev/rust/mrustc/bin/mrustc rustc-1.39.0-src/vendor/cc/src/lib.rs -o output-1.39.0/rustc-build/libcc-1_0_35.rlib --crate-name cc --crate-type rlib -C emit-depfile=output-1.39.0/rustc-build/libcc-1_0_35.rlib.d --crate-tag 1_0_35 -g --cfg debug_assertions -O -L output-1.39.0 -L output-1.39.0/rustc-build
> /mnt/hd2019c-nobackup/dev/rust/mrustc/bin/mrustc rustc-1.39.0-src/src/librustc_llvm/build.rs --crate-name build --crate-type bin -o output-1.39.0/rustc-build/build_rustc_llvm_run -L output-1.39.0/rustc-build -g -L output-1.39.0 --extern build_helper=output-1.39.0/rustc-build/libbuild_helper-0_1_0.rlib --extern cc=output-1.39.0/rustc-build/libcc-1_0_35.rlib --edition 2018
> /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm_run
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', rustc-1.39.0-src/vendor/hashbrown/src/raw/mod.rs:1086:59
Process was terminated with signal 6
FAILING COMMAND:  /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm_run
Calling /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm_run failed (see /mnt/hd2019c-nobackup/dev/rust/mrustc/output-1.39.0/rustc-build/build_rustc_llvm.txt_failed.txt for stdout)
BUILD FAILED
make: *** [minicargo.mk:223: output-1.39.0/rustc] Error 1

I don't know how I would diagnose it further?

luke-jr avatar Feb 22 '22 00:02 luke-jr

A backtrace from that panic would help, and maybe a some reading of the generated code to see if you can see where that ctrl value is coming from (and thus why it's not properly aligned)

thepowersgang avatar Feb 26 '22 08:02 thepowersgang

Also: what platform are you on? I can't seem to reproduce this failure on mint 20.3 (gcc 9.3.0-17ubuntu1~20.04)

thepowersgang avatar Feb 27 '22 02:02 thepowersgang

I'm running into this issue as well (powerpc64le using guix to build rust). Is anyone still working on this or is a workaround known? How can I help?

mrvdb avatar Aug 20 '22 08:08 mrvdb

@mrvdb It should have been fixed with the improvements to alignment handling... but if it's still crashing, you can help by identifying the misaligned type - and the correct alignment for it.

thepowersgang avatar Aug 30 '22 11:08 thepowersgang