redex
redex copied to clipboard
App runtime native crash due to redex passes
We noticed a huge spike in crash-rate for our first launch with the redex optimized apk. We are still investigating the root cause, but I wanted to know if you guys have seen this before / or are aware of any issues that can cause this.
We are seeing two crashes, and I'm pasting the snippet with the redex config below-
- SIGSEGV
%YAML 1.2
---
# Generated by Ferrite ver. 0.0.1-98ac0be.204
common :
process : 28358 >>> .android.master <<<
crash_timestamp : 2019-08-15T18:51:48Z
machine : aarch64
word_size : 64
process_uptime : PT47M55S
virtual_memory_usage : 5054808kB
fault :
thread : 7887 >>> SecureChatSessi <<<
signal : SIGSEGV
code : SEGV_MAPERR
fault address : 0000000000000000
threads :
7887 :
name : SecureChatSessi
crashed : true
regs0 : |
x0 0000000012ec01e0 x1 0000000000000000 x2 0000000012ec01e0 x3 0000007ac96b8800
x4 0000000012ec0148 x5 000000000000001a x6 0000000000000000 x7 000000006fd75208
x8 8236ad88b596c87c x9 8236ad88b596c87c x10 0000000000000064 x11 0000000000000000
x12 0000000000000020 x13 0000000012ec01a4 x14 0000000012ec0218 x15 0000000000000000
x16 0000007abff66a10 x17 242dc8b500000020 x18 00000000ebad6082 x19 0000007ac96b8800
x20 0000000000000000 x21 0000000012ec0000 x22 0000000014599ea8 x23 0000000012c0a2c8
x24 0000000016059038 x25 000000001687f638 x26 0000000016d4cee0 x27 0000000012ec00a8
x28 00000000190c1d90 x29 0000000012ec01e0
sp 0000007abff66ae0 lr 000000009bca5384 pc 000000009bca5390
# E: tracer-elf : inaccessible path: /dev/ashmem/dalvik-jit-code-cache_28358_28358 (deleted): Not a directory
0 : 000000009bca5390 /dev/ashmem/dalvik-jit-code-cache_28358_28358 (deleted)
1403 :
name : pool-35-thread-
0 : 000000000001f4ec /system/lib64/libc.so (syscall+28)
1 : 00000000000d7850 /system/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+148)
2 : "00000000003bde78 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, long, int, bool, art::ThreadState)+636)"
3 : "00000000003bf934 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, art::mirror::Object*, long, int, bool, art::ThreadState)+424)"
4 : 000000000011541c /system/framework/arm64/boot.oat
1409 :
name : pool-36-thread-
0 : 000000000001f4ec /system/lib64/libc.so (syscall+28)
1 : 00000000000d7850 /system/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+148)
2 : "00000000003bde78 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, long, int, bool, art::ThreadState)+636)"
3 : "00000000003bf934 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, art::mirror::Object*, long, int, bool, art::ThreadState)+424)"
4 : 000000000011541c /system/framework/arm64/boot.oat
- SIGABRT
%YAML 1.2
---
# Generated by Ferrite ver. 0.0.1-98ac0be.204
common :
process : 16522 >>> .android.master <<<
crash_timestamp : 2019-08-07T22:21:15Z
machine : aarch64
word_size : 64
process_uptime : PT22H06M04S
virtual_memory_usage : 3338432kB
fault :
thread : 4287 >>> SecureChatSessi <<<
signal : SIGABRT
code : SI_TKILL
abort message : "primitive.h:134] Primitive char conversion on invalid type 78"
threads :
4287 :
name : SecureChatSessi
crashed : true
regs0 : |
x0 0000000000000000 x1 00000000000010bf x2 0000000000000006 x3 0000000000000008
x4 00000070ed028000 x5 00000070ed028000 x6 00000070ed028000 x7 0000000000389fb6
x8 0000000000000083 x9 69c19b6a698ca48e x10 0000000000000000 x11 0000000000000001
x12 ffffffffffffffff x13 ffffffffa2b52abf x14 000000002ee198f0 x15 00000070ea4e5000
x16 00000070ea4da2e8 x17 00000070ea47bbdc x18 0000000000000020 x19 000000000000408a
x20 00000000000010bf x21 0000000000000067 x22 00000070af9fecd9 x23 000000001509a928
x24 0000000012f60df0 x25 0000000012f75998 x26 0000000012e09418 x27 000000001340ce58
x28 000000001340cf40 x29 00000070af9fec40
sp 00000070af9fec00 lr 00000070ea42e2a4 pc 00000070ea47bbe4
# E: tracer-elf : inaccessible path: /dev/ashmem/dalvik-jit-code-cache_16522_16522 (deleted): Not a directory
0 : 000000000006bbe4 /system/lib64/libc.so (tgkill+8)
1 : 000000000001e2a0 /system/lib64/libc.so (abort+88)
2 : 0000000000437854 /system/lib64/libart.so (art::Runtime::Abort(char const*)+528)
3 : 0000000000437f64 /system/lib64/libart.so (art::Runtime::Aborter(char const*)+24)
4 : 000000000052263c /system/lib64/libart.so (android::base::LogMessage::~LogMessage()+900)
5 : 000000000010fe14 /system/lib64/libart.so (art::Primitive::Descriptor(art::Primitive::Type)+220)
6 : "000000000037a8e4 /system/lib64/libart.so (art::mirror::Class::GetDescriptor(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*)+56)"
7 : 000000000037f948 /system/lib64/libart.so (art::mirror::Class::PrettyDescriptor(art::ObjPtr<art::mirror::Class>)+48)
8 : "0000000000158c50 /system/lib64/libart.so (art::ThrowClassCastException(art::ObjPtr<art::mirror::Class>, art::ObjPtr<art::mirror::Class>)+84)"
9 : 00000000004e29b8 /system/lib64/libart.so (artThrowClassCastException+16)
10 : 00000000004e2a04 /system/lib64/libart.so (artThrowClassCastExceptionForObject+64)
11 : 0000000000508420 /system/lib64/libart.so (art_quick_check_instance_of+112)
12 : 000000009aa08d44 /dev/ashmem/dalvik-jit-code-cache_16522_16522 (deleted)
825 :
name : net-int-35
0 : 000000000001ddec /system/lib64/libc.so (syscall+28)
1 : 00000000000e1f30 /system/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+152)
2 : "0000000000392af4 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, long, int, bool, art::ThreadState)+628)"
3 : "00000000003945d4 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, art::mirror::Object*, long, int, bool, art::ThreadState)+252)"
4 : 00000000001dcadc /system/framework/arm64/boot-core-oj.oat (oatexec+2780)
1238 :
name : net-int-36
0 : 000000000001ddec /system/lib64/libc.so (syscall+28)
1 : 00000000000e1f30 /system/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+152)
2 : "0000000000392af4 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, long, int, bool, art::ThreadState)+628)"
3 : "00000000003945d4 /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, art::mirror::Object*, long, int, bool, art::ThreadState)+252)"
4 : 00000000001dcadc /system/framework/arm64/boot-core-oj.oat (oatexec+2780)
2188 :
- Redex config -
{
"redex" : {
"passes" : [
"SimplifyCFGPass",
"ReBindRefsPass",
"BridgePass",
"ResultPropagationPass",
"SynthPass",
"FinalInlinePass",
"DelSuperPass",
"MethodInlinePass",
"PeepholePass",
"BranchPrefixHoistingPass",
"ConstantPropagationPass",
"LocalDcePass",
"RemoveUnreachablePass",
"DedupBlocksPass",
"RegAllocPass",
"ResultPropagationPass",
"RemoveRedundantCheckCastsPass",
"CopyPropagationPass",
"StaticReloPass",
"ObjectSensitiveDcePass",
"RemoveEmptyClassesPass",
"LocalDcePass",
"ReduceGotosPass"
]
},
"FinalInlinePass" : {
"propagate_static_finals": true,
"replace_encodable_clinits": true
},
"MethodInlinePass": {
"throws": true,
"multiple_callers": true,
"black_list": []
},
"ConstantPropagationPassV3" : {
"blacklist": [],
"replace_moves_with_consts": true,
"fold_arithmetic": true
},
"RegAllocPass" : {
"live_range_splitting": false
},
"CopyPropagationPass" : {
"eliminate_const_literals": false,
"full_method_analysis": true
},
"PeepholePass" : {
"disabled_peepholes": [
"Replace_PutGet",
"Replace_PutGetWide",
"Replace_PutGetObject",
"Replace_PutGetShort",
"Replace_PutGetChar",
"Replace_PutGetByte",
"Replace_PutGetBoolean"
]
},
"stats_output": "redex-stats.txt"
}
One question I have is whether I can ignore a class (or bunch of them) from all passes? If so how can I do that? This is because, we are seeing this crash happen on a specific class, and I'd like to filter it out so I can see how the rest of the apk performs with redex enabled.
I'm aware that I can perform black-list in individual passes such as SingleImpl, but is it possible to exclude a class from being optimized completely? -
"SingleImplPass": {
"black_list" : [
"Lcom/foo/SomeInterface;",
"Lcom/foo/SomeOtherInterface;"
],
"package_black_list" : [
"Lcom/some/package/bar"
]
}
Do you have a repro for this issue? If you have could you try removing some passes to find out which passes caused the problem?
Also one thing I noticed is "Primitive char conversion on invalid type 78", could you try adding IR type checker to check if IR is valid after each pass?
"ir_type_checker": {
"run_after_each_pass" : true,
"verify_moves" : true
}
is it possible to exclude a class from being optimized completely?
You can use proguard keep rule to keep that class and its members, There is also a "no_optimizations_annotations", so you can add some annotation to the class and members you don't want to optimize, and add that annotation to the config like
"no_optimizations_annotations" : [
"Lcom/your_donot_optimize_annotation;"
]
Most passes should obey this two rules. (there are some passes don't obey them though, like RegAllocPass)
Unfortunately this is not reproducible. We are seeing this only from production crash reports. Our goal is to identify which pass(es) could cause this, but right now its very challenging without the repro, hence why we were exploring just ignoring these classes completely from the redex pass.
to check if IR is valid after each pass?
Do I need to enable logging or something? When I ran redex on my apk (with the ir_type_checker enabled) - it just successfully built with the above config.
You can use proguard keep rule to keep that class and its members
We don't want to disable proguard shrinking / obfuscation though as that has not caused any issues. I was hoping to run proguard on those classes but not the redex optimizations. I understand that by keeping the class, redex won't run optimizations on it.
We will give it a shot with the two suggestions you mentioned above
Thanks a lot for the quick reply!
Another thing I suggest you trying is putting ResultPropagationPass, RemoveRedundantCheckCastsPass, ObjectSensitiveDcePass before RegAllocPass. We usually put RegAlloc at the near end, so that register allocation done by RegAllocPass won't get altered again by some other passes.
Do I need to enable logging or something? When I ran redex on my apk (with the ir_type_checker enabled) - it just successfully built with the above config.
No logging is needed, if there is some invalid IR code it will crash the build. Since it built successfully probably IR type checker didn't find out problems.
Another thing I suggest you trying is putting ResultPropagationPass, RemoveRedundantCheckCastsPass, ObjectSensitiveDcePass before RegAllocPass. We usually put RegAlloc at the near end, so that register allocation done by RegAllocPass won't get altered again by some other passes.
Sounds good! I will also try that
Which android OS version(s) does this occur on?
We have mostly seen this on Android 8(O) and Android 9(P) OS. I have not encountered anything 7 or below.
I'm curious, have you guys run into any production crashes from redex? How do you attribute a crash that may have resulted from a bug in one of the optimization passes?
For the SIGABRT you can't be seeing it exactly like that on Android 7.0 because art_quick_check_instance_of didn't exist then. However there might still be the same error that manifests differently. Do you see any type of ClassCastException in <7.0 with similar frequency ?
It seems that the vm is trying to execute a bogus check-cast instruction into a type that can't exist probably because the type index in the instruction is wrong
I'm curious, have you guys run into any production crashes from redex?
Yes, but I probably shouldn't say too much more :P
How do you attribute a crash that may have resulted from a bug in one of the optimization passes?
Without a local repro, it's quite hard to do. Sometimes we manually inspect the bytecode and see if it is a faithful representation of the input code or not.
It's kinda rare for Redex to cause a SIGABRT/SIGSEGV. Because redex only optimizes the Java code, Redex mistakes usually manifest as thrown Exceptions. However, it is possible for Redex to cause problems like this.
We've seen cases where there are bugs in the VM (usually on older OSes though) that Redex triggers by emitting a sequence of bytecodes that trip up the VM, even if the bytecode should be valid according to the spec.
For example, Google's Dexer, D8, has documentation of VM bugs they have found at the bottom of this file: https://r8.googlesource.com/r8/+/master/src/main/java/com/android/tools/r8/utils/InternalOptions.java. I'm not saying this bug is in this list, but it could be something similar... not sure.
You can also use http://androidxref.com/ to search the android open source repository to look at the implementation and see if it sheds light on your crash.
For the SIGABRT you can't be seeing it exactly like that on Android 7.0 because art_quick_check_instance_of didn't exist then. However there might still be the same error that manifests differently. Do you see any type of ClassCastException in <7.0 with similar frequency ?
It seems that the vm is trying to execute a bogus check-cast instruction into a type that can't exist probably because the type index in the instruction is wrong
I checked our crash reports, and we haven't see any crashes around ClassCastException <7.0 when redex was enabled.
What we are trying to do next:
- See if we can get an internal repro of this crash (We haven't so far)
- We are planning to roll out only a small subset of the optimization passes and see if we can identify which pass(es) are causing that crash.