redex icon indicating copy to clipboard operation
redex copied to clipboard

App runtime native crash due to redex passes

Open gydeveloper opened this issue 6 years ago • 12 comments

We noticed a huge spike in crash-rate for our first launch with the redex optimized apk. We are still investigating the root cause, but I wanted to know if you guys have seen this before / or are aware of any issues that can cause this.

We are seeing two crashes, and I'm pasting the snippet with the redex config below-

  • SIGSEGV
%YAML 1.2
---
# Generated by Ferrite ver. 0.0.1-98ac0be.204
common :
    process : 28358 >>> .android.master <<<
    crash_timestamp : 2019-08-15T18:51:48Z
    machine : aarch64
    word_size : 64
    process_uptime : PT47M55S
    virtual_memory_usage : 5054808kB
fault :
    thread : 7887 >>> SecureChatSessi <<<
    signal : SIGSEGV
    code : SEGV_MAPERR
    fault address : 0000000000000000
threads :
    7887 :
        name : SecureChatSessi
        crashed : true
        regs0 : |
                x0  0000000012ec01e0  x1  0000000000000000  x2  0000000012ec01e0  x3  0000007ac96b8800
                x4  0000000012ec0148  x5  000000000000001a  x6  0000000000000000  x7  000000006fd75208
                x8  8236ad88b596c87c  x9  8236ad88b596c87c  x10 0000000000000064  x11 0000000000000000
                x12 0000000000000020  x13 0000000012ec01a4  x14 0000000012ec0218  x15 0000000000000000
                x16 0000007abff66a10  x17 242dc8b500000020  x18 00000000ebad6082  x19 0000007ac96b8800
                x20 0000000000000000  x21 0000000012ec0000  x22 0000000014599ea8  x23 0000000012c0a2c8
                x24 0000000016059038  x25 000000001687f638  x26 0000000016d4cee0  x27 0000000012ec00a8
                x28 00000000190c1d90  x29 0000000012ec01e0
                sp  0000007abff66ae0  lr  000000009bca5384  pc  000000009bca5390
# E: tracer-elf      : inaccessible path: /dev/ashmem/dalvik-jit-code-cache_28358_28358 (deleted): Not a directory
        0 : 000000009bca5390 /dev/ashmem/dalvik-jit-code-cache_28358_28358 (deleted)
    1403 :
        name : pool-35-thread-
        0 : 000000000001f4ec /system/lib64/libc.so (syscall+28)
        1 : 00000000000d7850 /system/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+148)
        2 : "00000000003bde78 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, long, int, bool, art::ThreadState)+636)"
        3 : "00000000003bf934 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, art::mirror::Object*, long, int, bool, art::ThreadState)+424)"
        4 : 000000000011541c /system/framework/arm64/boot.oat
    1409 :
        name : pool-36-thread-
        0 : 000000000001f4ec /system/lib64/libc.so (syscall+28)
        1 : 00000000000d7850 /system/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+148)
        2 : "00000000003bde78 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, long, int, bool, art::ThreadState)+636)"
        3 : "00000000003bf934 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, art::mirror::Object*, long, int, bool, art::ThreadState)+424)"
        4 : 000000000011541c /system/framework/arm64/boot.oat
  • SIGABRT
%YAML 1.2
---
# Generated by Ferrite ver. 0.0.1-98ac0be.204
common :
    process : 16522 >>> .android.master <<<
    crash_timestamp : 2019-08-07T22:21:15Z
    machine : aarch64
    word_size : 64
    process_uptime : PT22H06M04S
    virtual_memory_usage : 3338432kB
fault :
    thread : 4287 >>> SecureChatSessi <<<
    signal : SIGABRT
    code : SI_TKILL
    abort message : "primitive.h:134] Primitive char conversion on invalid type 78"
threads :
    4287 :
        name : SecureChatSessi
        crashed : true
        regs0 : |
                x0  0000000000000000  x1  00000000000010bf  x2  0000000000000006  x3  0000000000000008
                x4  00000070ed028000  x5  00000070ed028000  x6  00000070ed028000  x7  0000000000389fb6
                x8  0000000000000083  x9  69c19b6a698ca48e  x10 0000000000000000  x11 0000000000000001
                x12 ffffffffffffffff  x13 ffffffffa2b52abf  x14 000000002ee198f0  x15 00000070ea4e5000
                x16 00000070ea4da2e8  x17 00000070ea47bbdc  x18 0000000000000020  x19 000000000000408a
                x20 00000000000010bf  x21 0000000000000067  x22 00000070af9fecd9  x23 000000001509a928
                x24 0000000012f60df0  x25 0000000012f75998  x26 0000000012e09418  x27 000000001340ce58
                x28 000000001340cf40  x29 00000070af9fec40
                sp  00000070af9fec00  lr  00000070ea42e2a4  pc  00000070ea47bbe4
# E: tracer-elf      : inaccessible path: /dev/ashmem/dalvik-jit-code-cache_16522_16522 (deleted): Not a directory
        0 : 000000000006bbe4 /system/lib64/libc.so (tgkill+8)
        1 : 000000000001e2a0 /system/lib64/libc.so (abort+88)
        2 : 0000000000437854 /system/lib64/libart.so (art::Runtime::Abort(char const*)+528)
        3 : 0000000000437f64 /system/lib64/libart.so (art::Runtime::Aborter(char const*)+24)
        4 : 000000000052263c /system/lib64/libart.so (android::base::LogMessage::~LogMessage()+900)
        5 : 000000000010fe14 /system/lib64/libart.so (art::Primitive::Descriptor(art::Primitive::Type)+220)
        6 : "000000000037a8e4 /system/lib64/libart.so (art::mirror::Class::GetDescriptor(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*)+56)"
        7 : 000000000037f948 /system/lib64/libart.so (art::mirror::Class::PrettyDescriptor(art::ObjPtr<art::mirror::Class>)+48)
        8 : "0000000000158c50 /system/lib64/libart.so (art::ThrowClassCastException(art::ObjPtr<art::mirror::Class>, art::ObjPtr<art::mirror::Class>)+84)"
        9 : 00000000004e29b8 /system/lib64/libart.so (artThrowClassCastException+16)
        10 : 00000000004e2a04 /system/lib64/libart.so (artThrowClassCastExceptionForObject+64)
        11 : 0000000000508420 /system/lib64/libart.so (art_quick_check_instance_of+112)
        12 : 000000009aa08d44 /dev/ashmem/dalvik-jit-code-cache_16522_16522 (deleted)
    825 :
        name : net-int-35
        0 : 000000000001ddec /system/lib64/libc.so (syscall+28)
        1 : 00000000000e1f30 /system/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+152)
        2 : "0000000000392af4 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, long, int, bool, art::ThreadState)+628)"
        3 : "00000000003945d4 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, art::mirror::Object*, long, int, bool, art::ThreadState)+252)"
        4 : 00000000001dcadc /system/framework/arm64/boot-core-oj.oat (oatexec+2780)
    1238 :
        name : net-int-36
        0 : 000000000001ddec /system/lib64/libc.so (syscall+28)
        1 : 00000000000e1f30 /system/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+152)
        2 : "0000000000392af4 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, long, int, bool, art::ThreadState)+628)"
        3 : "00000000003945d4 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, art::mirror::Object*, long, int, bool, art::ThreadState)+252)"
        4 : 00000000001dcadc /system/framework/arm64/boot-core-oj.oat (oatexec+2780)
    2188 :
  • Redex config -
{
"redex" : {
  "passes" : [
    "SimplifyCFGPass",
    "ReBindRefsPass",
    "BridgePass",
    "ResultPropagationPass",
    "SynthPass",
    "FinalInlinePass",
    "DelSuperPass",
    "MethodInlinePass",
    "PeepholePass",
    "BranchPrefixHoistingPass",
    "ConstantPropagationPass",
    "LocalDcePass",
    "RemoveUnreachablePass",
    "DedupBlocksPass",
    "RegAllocPass",
    "ResultPropagationPass",
    "RemoveRedundantCheckCastsPass",
    "CopyPropagationPass",
    "StaticReloPass",
    "ObjectSensitiveDcePass",
    "RemoveEmptyClassesPass",
    "LocalDcePass",
    "ReduceGotosPass"
  ]
},
  "FinalInlinePass" : {
    "propagate_static_finals": true,
    "replace_encodable_clinits": true
  },
  "MethodInlinePass": {
    "throws": true,
    "multiple_callers": true,
    "black_list": []
  },
  "ConstantPropagationPassV3" : {
    "blacklist": [],
    "replace_moves_with_consts": true,
    "fold_arithmetic": true
  },
  "RegAllocPass" : {
    "live_range_splitting": false
  },
  "CopyPropagationPass" : {
    "eliminate_const_literals": false,
    "full_method_analysis": true
  },
  "PeepholePass" : {
    "disabled_peepholes": [
      "Replace_PutGet",
      "Replace_PutGetWide",
      "Replace_PutGetObject",
      "Replace_PutGetShort",
      "Replace_PutGetChar",
      "Replace_PutGetByte",
      "Replace_PutGetBoolean"
    ]
  },
  "stats_output": "redex-stats.txt"
}

gydeveloper avatar Aug 29 '19 23:08 gydeveloper

One question I have is whether I can ignore a class (or bunch of them) from all passes? If so how can I do that? This is because, we are seeing this crash happen on a specific class, and I'd like to filter it out so I can see how the rest of the apk performs with redex enabled.

I'm aware that I can perform black-list in individual passes such as SingleImpl, but is it possible to exclude a class from being optimized completely? -

"SingleImplPass": {
  "black_list" : [
    "Lcom/foo/SomeInterface;",
    "Lcom/foo/SomeOtherInterface;"
  ],
  "package_black_list" : [
    "Lcom/some/package/bar"
  ]
}

gydeveloper avatar Aug 29 '19 23:08 gydeveloper

Do you have a repro for this issue? If you have could you try removing some passes to find out which passes caused the problem?

Also one thing I noticed is "Primitive char conversion on invalid type 78", could you try adding IR type checker to check if IR is valid after each pass?

  "ir_type_checker": {
    "run_after_each_pass" : true,
    "verify_moves" : true
  }

is it possible to exclude a class from being optimized completely?

You can use proguard keep rule to keep that class and its members, There is also a "no_optimizations_annotations", so you can add some annotation to the class and members you don't want to optimize, and add that annotation to the config like

"no_optimizations_annotations" : [
  "Lcom/your_donot_optimize_annotation;"
]

Most passes should obey this two rules. (there are some passes don't obey them though, like RegAllocPass)

ssj933 avatar Aug 30 '19 17:08 ssj933

Unfortunately this is not reproducible. We are seeing this only from production crash reports. Our goal is to identify which pass(es) could cause this, but right now its very challenging without the repro, hence why we were exploring just ignoring these classes completely from the redex pass.

to check if IR is valid after each pass?

Do I need to enable logging or something? When I ran redex on my apk (with the ir_type_checker enabled) - it just successfully built with the above config.

You can use proguard keep rule to keep that class and its members

We don't want to disable proguard shrinking / obfuscation though as that has not caused any issues. I was hoping to run proguard on those classes but not the redex optimizations. I understand that by keeping the class, redex won't run optimizations on it.

We will give it a shot with the two suggestions you mentioned above

Thanks a lot for the quick reply!

gydeveloper avatar Aug 30 '19 18:08 gydeveloper

Another thing I suggest you trying is putting ResultPropagationPass, RemoveRedundantCheckCastsPass, ObjectSensitiveDcePass before RegAllocPass. We usually put RegAlloc at the near end, so that register allocation done by RegAllocPass won't get altered again by some other passes.

Do I need to enable logging or something? When I ran redex on my apk (with the ir_type_checker enabled) - it just successfully built with the above config.

No logging is needed, if there is some invalid IR code it will crash the build. Since it built successfully probably IR type checker didn't find out problems.

ssj933 avatar Aug 30 '19 20:08 ssj933

Another thing I suggest you trying is putting ResultPropagationPass, RemoveRedundantCheckCastsPass, ObjectSensitiveDcePass before RegAllocPass. We usually put RegAlloc at the near end, so that register allocation done by RegAllocPass won't get altered again by some other passes.

Sounds good! I will also try that

gydeveloper avatar Aug 30 '19 21:08 gydeveloper

Which android OS version(s) does this occur on?

justinjhendrick avatar Sep 03 '19 19:09 justinjhendrick

We have mostly seen this on Android 8(O) and Android 9(P) OS. I have not encountered anything 7 or below.

gydeveloper avatar Sep 03 '19 20:09 gydeveloper

I'm curious, have you guys run into any production crashes from redex? How do you attribute a crash that may have resulted from a bug in one of the optimization passes?

gydeveloper avatar Sep 05 '19 02:09 gydeveloper

For the SIGABRT you can't be seeing it exactly like that on Android 7.0 because art_quick_check_instance_of didn't exist then. However there might still be the same error that manifests differently. Do you see any type of ClassCastException in <7.0 with similar frequency ?

It seems that the vm is trying to execute a bogus check-cast instruction into a type that can't exist probably because the type index in the instruction is wrong

andreasnomikos avatar Sep 05 '19 07:09 andreasnomikos

I'm curious, have you guys run into any production crashes from redex?

Yes, but I probably shouldn't say too much more :P

How do you attribute a crash that may have resulted from a bug in one of the optimization passes?

Without a local repro, it's quite hard to do. Sometimes we manually inspect the bytecode and see if it is a faithful representation of the input code or not.

It's kinda rare for Redex to cause a SIGABRT/SIGSEGV. Because redex only optimizes the Java code, Redex mistakes usually manifest as thrown Exceptions. However, it is possible for Redex to cause problems like this.

We've seen cases where there are bugs in the VM (usually on older OSes though) that Redex triggers by emitting a sequence of bytecodes that trip up the VM, even if the bytecode should be valid according to the spec.

For example, Google's Dexer, D8, has documentation of VM bugs they have found at the bottom of this file: https://r8.googlesource.com/r8/+/master/src/main/java/com/android/tools/r8/utils/InternalOptions.java. I'm not saying this bug is in this list, but it could be something similar... not sure.

You can also use http://androidxref.com/ to search the android open source repository to look at the implementation and see if it sheds light on your crash.

justinjhendrick avatar Sep 05 '19 22:09 justinjhendrick

For the SIGABRT you can't be seeing it exactly like that on Android 7.0 because art_quick_check_instance_of didn't exist then. However there might still be the same error that manifests differently. Do you see any type of ClassCastException in <7.0 with similar frequency ?

It seems that the vm is trying to execute a bogus check-cast instruction into a type that can't exist probably because the type index in the instruction is wrong

I checked our crash reports, and we haven't see any crashes around ClassCastException <7.0 when redex was enabled.

gydeveloper avatar Sep 06 '19 19:09 gydeveloper

What we are trying to do next:

  • See if we can get an internal repro of this crash (We haven't so far)
  • We are planning to roll out only a small subset of the optimization passes and see if we can identify which pass(es) are causing that crash.

gydeveloper avatar Sep 06 '19 19:09 gydeveloper