ndk
ndk copied to clipboard
[Bug] NDK's openmp test crashing on x86 with latest LLVM update
Description
https://android.googlesource.com/platform/ndk/+/refs/heads/main/tests/device/openmp/ and https://android.googlesource.com/platform/ndk/+/refs/heads/main/tests/device/openmp_shared/jni are both failing when I update to clang-r530567. There are four sub tests (two of which are identical aside from whether they use the static or shared openmp, the other two are both static openmp) that do pretty trivial things, but all are crashing on 32-bit x86.
Repro steps (you'll need to download the NDK from the artifacts on https://android-review.googlesource.com/c/platform/ndk/+/3168418 unless you want to build it yourself, prebuilts/clang doesn't have a sysroot so can't build it without the rest of the NDK):
$ ndk/toolchains/llvm/prebuilt/$HOST/bin/clang -target i686-linux-android21 ndk/tests/device/openmp/jni/openmp.c -fopenmp -static-openmp -o omp_test
$ adb push omp_test /data/local/tmp/
$ adb shell "/data/local/tmp/omp_test; echo $?"
I think you should also be able to download the ndk-tests.tar.bz2 artifact to get the test executables if you don't need to rebuild them.
Logcat says:
--------- beginning of crash
07-11 13:52:20.771 13475 13475 F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x48 in tid 13475 (fib), pid 13475 (fib)
--------- beginning of main
07-11 13:52:20.778 13478 13478 I crash_dump32: obtaining output fd from tombstoned, type: kDebuggerdTombstone
07-11 13:52:20.779 1851 1851 I /system/bin/tombstoned: received crash request for pid 13475
07-11 13:52:20.779 13478 13478 I crash_dump32: performing dump of process 13475 (target tid = 13475)
07-11 13:52:20.780 13478 13478 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
07-11 13:52:20.780 13478 13478 F DEBUG : Build fingerprint: 'google/sdk_goog3_x86/generic_x86:10/QSR1.211112.003/8112140:userdebug/test-keys'
07-11 13:52:20.780 13478 13478 F DEBUG : Revision: '0'
07-11 13:52:20.780 13478 13478 F DEBUG : ABI: 'x86'
07-11 13:52:20.780 13478 13478 F DEBUG : Timestamp: 2024-07-11 13:52:20-0700
07-11 13:52:20.781 13478 13478 F DEBUG : pid: 13475, tid: 13475, name: fib >>> ./fib <<<
07-11 13:52:20.781 13478 13478 F DEBUG : uid: 0
07-11 13:52:20.781 13478 13478 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x48
07-11 13:52:20.781 13478 13478 F DEBUG : Cause: null pointer dereference
07-11 13:52:20.781 13478 13478 F DEBUG : eax 5ba343f0 ebx 00000000 ecx 00000000 edx 00000000
07-11 13:52:20.781 13478 13478 F DEBUG : edi 00000000 esi 5ba343cc
07-11 13:52:20.781 13478 13478 F DEBUG : ebp ffad14a8 esp ffad145c eip 5ba2fac0
07-11 13:52:20.781 13478 13478 F DEBUG :
07-11 13:52:20.781 13478 13478 F DEBUG : backtrace:
07-11 13:52:20.781 13478 13478 F DEBUG : #00 pc 000acac0 /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.781 13478 13478 F DEBUG : #01 pc 00071281 /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (__kmp_query_cpuid+305) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.781 13478 13478 F DEBUG : #02 pc 000a32e6 /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (__kmp_runtime_initialize+54) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.781 13478 13478 F DEBUG : #03 pc 0002b427 /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (__kmp_do_serial_initialize()+359) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.781 13478 13478 F DEBUG : #04 pc 00034fa4 /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (__kmp_do_middle_initialize()+36) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.781 13478 13478 F DEBUG : #05 pc 00034f62 /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (__kmp_middle_initialize+66) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.782 13478 13478 F DEBUG : #06 pc 000a6152 /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (omp_get_num_procs+50) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.782 13478 13478 F DEBUG : #07 pc 0001f66c /data/local/tmp/tests/x86-21-legacy-weakapi/cmake/openmp/x86/fib (main+140) (BuildId: 8b41cd5b37adb209bc0f739b1bbb54243ca655b1)
07-11 13:52:20.782 13478 13478 F DEBUG : #08 pc 000898e8 /apex/com.android.runtime/lib/bionic/libc.so (__libc_init+120) (BuildId: 471745f0fbbcedb3db1553d5bd6fcd8b)
That's from an API 28 x86 emulator I got from go/xcid.
I've only tested on Darwin, but I expect this isn't host-specific.
Upstream bug
No response
Commit to cherry-pick
No response
Affected versions
Canary
Canary version
not actually submitted yet, clang-r530567
Host OS
Mac
Host OS version
all, probably
Affected ABIs
x86
In libomp.a, __kmp_query_cpuid+305 is the call to strrchr:
12d: e8 fc ff ff ff calll 0x12e <__kmp_query_cpuid+0x12e>
0000012e: R_386_PLT32 strrchr
132: 83 c4 10 addl $0x10, %esp
Is this an issue with constructors being called before the dynamic linker has initialized the PLT? (tagging @rprichard ).
(I've updated the CL to xfail the test for now, so if you try running run_tests.py now it won't yell loudly. If you want to see the failure, delete the test_config.py in those directories)
It looks like this PR was the cause, https://github.com/llvm/llvm-project/pull/84626. It has since been reverted, and my impression is that the unmodified original code is/was correct for x86 PIC (but I haven't confirmed that).
Debugging notes:
Is this an issue with constructors being called before the dynamic linker has initialized the PLT?
The dynamic linker calls constructors in a separate pass after applying relocations for all shared objects.
Here's the segfault I see:
08-06 20:07:27.439 1797 1797 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
08-06 20:07:27.439 1797 1797 F DEBUG : Build fingerprint: 'Android/sdk_slim_x86/generic_x86:11/RSR1.210722.009/7752727:userdebug/test-keys'
08-06 20:07:27.439 1797 1797 F DEBUG : Revision: '0'
08-06 20:07:27.439 1797 1797 F DEBUG : ABI: 'x86'
08-06 20:07:27.440 1797 1797 F DEBUG : Timestamp: 2024-08-06 20:07:27-0700
08-06 20:07:27.440 1797 1797 F DEBUG : pid: 1794, tid: 1794, name: omp_test >>> ./omp_test <<<
08-06 20:07:27.440 1797 1797 F DEBUG : uid: 2000
08-06 20:07:27.440 1797 1797 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x3c
08-06 20:07:27.440 1797 1797 F DEBUG : Cause: null pointer dereference
08-06 20:07:27.440 1797 1797 F DEBUG : eax 64b758f0 ebx 00000000 ecx 00000000 edx 00000000
08-06 20:07:27.440 1797 1797 F DEBUG : edi 00000000 esi 64b758cc
08-06 20:07:27.440 1797 1797 F DEBUG : ebp fff63678 esp fff6362c eip 64b71030
08-06 20:07:27.441 1797 1797 F DEBUG : backtrace:
08-06 20:07:27.441 1797 1797 F DEBUG : #00 pc 000ad030 /data/local/tmp/omp_test
08-06 20:07:27.441 1797 1797 F DEBUG : #01 pc 00071821 /data/local/tmp/omp_test (__kmp_query_cpuid+305)
08-06 20:07:27.441 1797 1797 F DEBUG : #02 pc 000a3886 /data/local/tmp/omp_test (__kmp_runtime_initialize+54)
08-06 20:07:27.441 1797 1797 F DEBUG : #03 pc 0002b9c7 /data/local/tmp/omp_test (__kmp_do_serial_initialize()+359)
08-06 20:07:27.441 1797 1797 F DEBUG : #04 pc 0002b83c /data/local/tmp/omp_test (__kmp_get_global_thread_id_reg+140)
08-06 20:07:27.441 1797 1797 F DEBUG : #05 pc 000a5e88 /data/local/tmp/omp_test (omp_set_num_threads+24)
08-06 20:07:27.436 1797 1797 I crash_dump32: type=1400 audit(0.0:99): avc: denied { read } for name="omp_test" dev="dm-5" ino=65540 scontext=u:r:crash_dump:s0 tcontext=u:object_r:shell_data_file:s0 tclass=file permissive=1
08-06 20:07:27.441 1797 1797 F DEBUG : #06 pc 0001fe46 /data/local/tmp/omp_test (main+86)
08-06 20:07:27.441 1797 1797 F DEBUG : #07 pc 000522e3 /apex/com.android.runtime/lib/bionic/libc.so (__libc_init+115) (BuildId: 6e3a0180fa6637b68c0d181c343e6806)
ad030 is in the .plt. It's read-only code that reads the relocated address from the .got:
000ad030 <strrchr@plt>:
ad030: ff a3 3c 00 00 00 jmp *0x3c(%ebx)
ad036: 68 60 00 00 00 push $0x60
ad03b: e9 20 ff ff ff jmp acf60 <__umoddi3+0xb8>
Apparently %ebx is zero, but it should be pointing to the .got. https://ewontfix.com/18/:
Since 32-bit x86 provides no direct way to do memory loads/stores relative to the current instruction pointer, the SysV ABI provides that, when code generated as PIC calls into a PLT thunk, it must pass a pointer to the GOT as a hidden argument in %ebx.
The %ebx register is supposed to point to the GOT, but it seems to be zero instead on entry to the .plt.
__kmp_query_cpuid initializes %edi to the GOT instead of %ebx:
716f9: e8 00 00 00 00 call 716fe <__kmp_query_cpuid+0xe>
716fe: 5f pop %edi
...
7170c: 81 c7 2e d7 03 00 add $0x3d72e,%edi
This pattern keeps showing up in __kmp_query_cpuid:
71716: 89 df mov %ebx,%edi
71718: 0f a2 cpuid
7171a: 87 df xchg %ebx,%edi
The first instance of this pattern overwrites the GOT saved in %edi.
It appears that cpuid accepts an input in %eax, and then writes its output to %eax, %ebx, %edx, %ecx. (https://datasheets.chipdb.org/Intel/x86/CPUID/24161821.pdf) Before invoking cpuid, the function saves off the value of %ebx (into %edi), then restores it afterwards.
@rprichard Thanks for identifying the fix! http://r.android.com/3217551 and http://r.android.com/3217550 pick these changes to AOSP toolchain.