ndk icon indicating copy to clipboard operation
ndk copied to clipboard

[BUG] ndk-stack does not accommodate for the difference in relative PC computation between Android versions

Open mraleph opened this issue 4 years ago • 6 comments

Description

Different Android versions compute relative PC printed in backtrace in different ways. Here are examples of the very same crash (same APK) running on three different devices with three different Android versions:

Android 8.1 on Pixel 2

   *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'google/walleye/walleye:8.1.0/OPM2.171026.006.G3/5513837:user/release-keys'
Revision: 'MP1'
ABI: 'arm'
pid: 10616, tid: 10669, name: 1.ui  >>> io.flutter.examples.hello_world <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
    r0 00000000  r1 000029ad  r2 00000006  r3 00000008
    r4 00002978  r5 000029ad  r6 c89ff08c  r7 0000010c
    r8 00000000  r9 c9cc0ac1  sl dc026400  fp c8a0011c
    ip c8ad651b  sp c89ff078  lr e5a33c31  pc e5a2d782  cpsr 200e0030
backtrace:
    #00 pc 0001a782  /system/lib/libc.so (abort+63)
    #01 pc 002c9b91  /data/app/io.flutter.examples.hello_world-tiqKsqQ08yBXU2hWODwfTA==/lib/arm/libflutter.so (offset 0xff5000)

Android 10 on Pixel 4a

  *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'google/flame/flame:10/QQ3A.200805.001/6578210:user/release-keys'
Revision: 'MP1.0'
ABI: 'arm'
Timestamp: 2020-10-20 13:47:08-0700
pid: 20536, tid: 20586, name: 1.ui  >>> io.flutter.examples.hello_world <<<
uid: 10222
signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
    r0  00000000  r1  0000506a  r2  00000006  r3  c2132878
    r4  c213288c  r5  c2132870  r6  00005038  r7  0000016b
    r8  c2132888  r9  c2132878  r10 c21328a8  r11 c2132898
    ip  0000506a  sp  c2132848  lr  ebe2c6e3  pc  ebe2c6f6
backtrace:
      #00 pc 0005f6f6  /apex/com.android.runtime/lib/bionic/libc.so (abort+166) (BuildId: 8c3173001a99af3ab544de85a610e066)
      #01 pc 012beb91  /data/app/io.flutter.examples.hello_world-8nGxY8_VmIDo8hf0WEUzUQ==/lib/arm/libflutter.so (BuildId: f3226de58c8d62b2de4d5f7b4066c4a9c0f07b4e)
      #02 pc 65000000  <unknown>

Android 11 on Pixel 3a

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'google/sargo/sargo:11/RP1A.201005.004/6782484:userdebug/dev-keys'
Revision: 'MP1.0'
ABI: 'arm'
Timestamp: 2020-10-22 09:28:26+0200
pid: 13676, tid: 13705, name: 1.ui  >>> io.flutter.examples.hello_world <<<
uid: 10274
signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
    r0  00000000  r1  00003589  r2  00000006  r3  c7a87808
    r4  c7a8781c  r5  c7a87800  r6  0000356c  r7  0000016b
    r8  c7a87808  r9  c7a87818  r10 c7a87838  r11 c7a87828
    ip  00003589  sp  c7a877d8  lr  f30433e1  pc  f30433f4
backtrace:
      #00 pc 000383f4  /apex/com.android.runtime/lib/bionic/libc.so (abort+172) (BuildId: 09f5dc86ced902a66ebda24ea42c217d)
      #01 pc 012bfb91  /data/app/~~sGzta02j0vlFNEgy7PjzQA==/io.flutter.examples.hello_world-HrS9T-azBIoKi_uwl9sUkQ==/lib/arm/libflutter.so (BuildId: f3226de58c8d62b2de4d5f7b4066c4a9c0f07b4e)
      #02 pc 66000000  <unknown>

Out of all three reports only the last one from Android 11 would symbolise correctly using ndk-stack:

$ ~/android-ndk-r21d/ndk-stack -sym . < crashes.txt
********** Crash dump: **********
Build fingerprint: 'google/walleye/walleye:8.1.0/OPM2.171026.006.G3/5513837:user/release-keys'
#00 0x0001a782 /system/lib/libc.so (abort+63)
#01 0x002c9b91 /data/app/io.flutter.examples.hello_world-tiqKsqQ08yBXU2hWODwfTA==/lib/arm/libflutter.so (offset 0xff5000)
                                                                                                         ??
                                                                                                         ??:0:0
Crash dump is completed

********** Crash dump: **********
Build fingerprint: 'google/flame/flame:10/QQ3A.200805.001/6578210:user/release-keys'
#00 0x0005f6f6 /apex/com.android.runtime/lib/bionic/libc.so (abort+166) (BuildId: 8c3173001a99af3ab544de85a610e066)
#01 0x012beb91 /data/app/io.flutter.examples.hello_world-8nGxY8_VmIDo8hf0WEUzUQ==/lib/arm/libflutter.so (BuildId: f3226de58c8d62b2de4d5f7b4066c4a9c0f07b4e)
                                                                                                         dart::DN_HelperInternal_makeListFixedLength(dart::Isolate*, dart::Thread*, dart::Zone*, dart::NativeArguments*)
                                                                                                         /usr/local/google/home/vegorov/src/flutter/engine/src/out/android_debug/../../third_party/dart/runtime/lib/growable_array.cc:84:3
                                                                                                         dart::BootstrapNatives::DN_Internal_makeListFixedLength(dart::Thread*, dart::Zone*, dart::NativeArguments*)
                                                                                                         /usr/local/google/home/vegorov/src/flutter/engine/src/out/android_debug/../../third_party/dart/runtime/lib/growable_array.cc:83:1
#02 0x65000000 <unknown>
Crash dump is completed

********** Crash dump: **********
Build fingerprint: 'google/sargo/sargo:11/RP1A.201005.004/6782484:userdebug/dev-keys'
#00 0x000383f4 /apex/com.android.runtime/lib/bionic/libc.so (abort+172) (BuildId: 09f5dc86ced902a66ebda24ea42c217d)
#01 0x012bfb91 /data/app/~~sGzta02j0vlFNEgy7PjzQA==/io.flutter.examples.hello_world-HrS9T-azBIoKi_uwl9sUkQ==/lib/arm/libflutter.so (BuildId: f3226de58c8d62b2de4d5f7b4066c4a9c0f07b4e)
                                                                                                                                    dart::DN_HelperObject_dumpStack(dart::Isolate*, dart::Thread*, dart::Zone*, dart::NativeArguments*)
                                                                                                                                    /usr/local/google/home/vegorov/src/flutter/engine/src/out/android_debug/../../third_party/dart/runtime/lib/object.cc:130:5
                                                                                                                                    dart::BootstrapNatives::DN_Object_dumpStack(dart::Thread*, dart::Zone*, dart::NativeArguments*)
                                                                                                                                    /usr/local/google/home/vegorov/src/flutter/engine/src/out/android_debug/../../third_party/dart/runtime/lib/object.cc:100:1
#02 0x66000000 <unknown>

This is not surprising: ndk-stack simply passes PCs it extracts from crash dumps as is into llvm-symbolizer (or addr2line). From what I can see this always was the behaviour (even when it was implemented as a C program). Both of these tools expect VMAs - however only Android 11 prints correct VMA.

Android 10 is off by 0x1000 (seems to be load bias - difference between .text section file offset and VMA): 012beb91 - 012bfb91 = 0x1000.

Android 8.1 seems to print offset into RX section - which is off from PC VMA by .text section VMA aligned down to the page size:

$ ~/android-ndk-r21d/toolchains/llvm/prebuilt/darwin-x86_64/bin/x86_64-linux-android-readelf -l libflutter.so | grep 'R E'
  LOAD           0xff57c0 0x00ff67c0 0x00ff67c0 0x53f870 0x53f870 R E 0x1000

Observe that 0x00ff67c0 & ~0xFFF = 0xff6000 and 0xff6000 + 002c9b91 = 0x12bfb91.

For your convenience this archive (shared with Google only) contains both crashing APK and a library with debugging information.

I suspect this might have been unnoticed over the years because GCC and LLVM lay out binaries in a slightly different way, so things might have worked okay with GCC and got broken with LLVM binaries.

Environment Details

Not all of these will be relevant to every bug, but please provide as much information as you can.

  • NDK Version: 21.3.6528147
  • Build system:
  • Host OS: Mac
  • ABI:
  • NDK API level:
  • Device API level:

mraleph avatar Oct 28 '20 22:10 mraleph

For the sake of completeness here is correction logic I ended up implementing myself to handle the difference (because I discovered that I can't rely on ndk-stack): https://dart-review.googlesource.com/c/dart_ci/+/168960/4/github-label-notifier/symbolizer/lib/symbolizer.dart#292

      computePCBias: (frames) async {
        if ((androidMajorVersion ?? 0) >= 11) {
          return 0;
        }
        // Prior to Android 11 backtraces printed by debuggerd contained PCs
        // which can't be directly used for symbolization. Very old versions
        // of Android printed offsets into RX mapping, while newer versions
        // printed ELF file offsets. We try to differentiate between these two
        // situations by checking if any PCs are outside of .text section range.
        // In both cases we can't directly use this PC for symbolization because
        // it does not necessarily match VMAs used in the ELF file (which is
        // what llvm-symbolizer would need for symbolization).
        final textSection = await ndk.getTextSectionInfo(flutterSo);
        final textStart = textSection.fileOffset;
        final textEnd = textSection.fileOffset + textSection.fileSize;
        final likelySectionOffset =
            !frames.every((f) => textStart <= f.pc && f.pc < textEnd);
        return likelySectionOffset
            ? (textSection.virtualAddress & ~0xfff)
            : (textSection.virtualAddress - textSection.fileOffset);
      }
  1. If Android major version (if present in Build fingerprint) is 11 or above then no correction is necessary.
  2. Otherwise try to check if all PCs can be interpreted as file offsets into .text section. If yes - apply load bias, if not treat them as .text relative and apply VMA bias instead.

mraleph avatar Oct 28 '20 23:10 mraleph

The unwinder in Android 8 was simply broken so it doesn't unwind correctly when using the latest linker.

The unwinder in Android 10 has a bug that could cause it to have the wrong relative pc, which is probably what you are seeing.

https://android.googlesource.com/platform/system/unwinding/+/master/libunwindstack/AndroidVersions.md

This document describes known bugs and the versions in which they are present. It also includes how to avoid these bugs on older versions of Android if you so choose.

There is the possibility we could modify ndk-stack to recognize this is an older version and try and modify the relative pc based on known issues. I'm not sure if that would work in all cases, but it's probably worth trying.

cferris1000 avatar Oct 28 '20 23:10 cferris1000

android.googlesource.com/platform/system/unwinding/+/master/libunwindstack/AndroidVersions.md

Oh, this is the sort of the document I was missing - would save me a lot of time figuring these myself by observation.

There is the possibility we could modify ndk-stack to recognize this is an older version and try and modify the relative pc based on known issues. I'm not sure if that would work in all cases, but it's probably worth trying.

FWIW you could consider at least adding a warning to ndk-stack output when it detects old Android version which might be affected by the bug if not make it to correct for bugs in older versions. I used to be very puzzled by ndk-stack output before and only recently got enough time to figure this out.

This is no longer high priority issue for me (as I have build my own tooling to work around the difference) but it might be still confusing to some other people.

mraleph avatar Oct 28 '20 23:10 mraleph

(i've added a link from the bionic docs to the unwinder docs: https://android-review.googlesource.com/c/platform/bionic/+/1482722)

enh-google avatar Oct 30 '20 23:10 enh-google

Can I ask if this is the problem I'm seeing? On my samsung gs5 I can debug fine using 'adb logcat | ndk-stack -sym' but on a motorola g g5 plus, it tells me:

WARNING: Mismatched build id for obj\symbols\arm64-v8a\libAnthracite.so WARNING: Expected 7dec706e65f69a669adaa7dd61f98879a519eacc WARNING: Found 87c3a641e9308a0e8df2af86f1b199692fa9baf2

Are there any known workaround I can use?

timluther avatar May 16 '21 18:05 timluther

To answer the previous comment, the error you are seeing is that there is a mismatch of the shared library libAnthracite.so. You are trying to get symbol information for two different versions of that library. The version that is on the motorola g g5 plus does not match the version that you are telling ndk-stack to use.

What you are describing is not relevant to this particular bug.

cferris1000 avatar May 17 '21 19:05 cferris1000