redex
redex copied to clipboard
DebugInfo substantially larger after Redex
Background DebugInfo is substantially larger after running Redex. For context in our proguard configuration file we do not explicitly keep sourceFile or lineNumbers. The only line numbers that are preserved by R8 are related to method inlining. Let me know if there is any other info I can provide.
Config
"passes": [
"AccessMarkingPass",
"MethodDevirtualizationPass",
"ReBindRefsPass",
"BridgePass",
"ResultPropagationPass",
"SynthPass",
"FinalInlinePassV2",
"DelSuperPass",
"UnreferencedInterfacesPass",
"SingleImplPass",
"CommonSubexpressionEliminationPass",
"MethodInlinePass",
"PeepholePass",
"ConstantPropagationPass",
"LocalDcePass",
"AnnoKillPass",
"ReorderInterfacesPass",
"RemoveEmptyClassesPass",
"SingleImplPass",
"InterDexPass",
"CommonSubexpressionEliminationPass",
"RemoveGotosPass",
"DedupBlocksPass",
"RemoveRedundantCheckCastsPass",
"UpCodeMotionPass",
"RegAllocPass",
"MakePublicPass",
"CopyPropagationPass",
"LocalDcePass",
"DedupBlocksPass",
"ReduceGotosPass"
]
Stats
Baseline:
Segments in dex application (name: size / items):
- EncodedArrays: 5728 / 441
- Header: 672 / 6
- DebugInfo: 1746499 / 74076
- Fields: 2525928 / 315741
- AnnotationSetRefs: 1768 / 109
- Strings: 1896120 / 474030
- Maps: 1296 / 6
- Protos: 1099932 / 91661
- Methods: 2808992 / 351124
- Types: 541136 / 135284
- StringData: 16857063 / 474030
- ClassData: 3058295 / 74116
- TypeLists: 755694 / 64783
- AnnotationsDirectory: 1287424 / 39606
- Annotation: 1578028 / 66547
- AnnotationSets: 547820 / 64919
- ClassDefs: 2446080 / 76440
- Code: 27480789 / 265780
Redex:
Segments in dex application (name: size / items):
- EncodedArrays: 6064 / 465
- Header: 672 / 6
- DebugInfo: 4801880 / 193772
- Fields: 2300912 / 287614
- AnnotationSetRefs: 1596 / 96
- Strings: 1804380 / 451095
- Maps: 1296 / 6
- Protos: 982704 / 81892
- Methods: 2612296 / 326537
- Types: 473236 / 118309
- StringData: 16085190 / 451095
- ClassData: 3044484 / 75119
- TypeLists: 686458 / 58662
- AnnotationsDirectory: 1187784 / 36088
- Annotation: 1376034 / 58818
- AnnotationSets: 476420 / 58343
- ClassDefs: 2440640 / 76270
- Code: 27208538 / 260361
diff with Diffuse:
│ compressed │ uncompressed
├──────────┬──────────┬───────────┼──────────┬───────────┬──────────
APK │ old │ new │ diff │ old │ new │ diff
──────────┼──────────┼──────────┼───────────┼──────────┼───────────┼──────────
dex │ 24 MiB │ 25.6 MiB │ +1.7 MiB │ 64.1 MiB │ 62.5 MiB │ -1.6 MiB
arsc │ 17 MiB │ 17 MiB │ -10 B │ 17 MiB │ 17 MiB │ 0 B
manifest │ 10.6 KiB │ 10.8 KiB │ +145 B │ 53.3 KiB │ 53.3 KiB │ 0 B
res │ 9.6 MiB │ 9.5 MiB │ -33.6 KiB │ 11.2 MiB │ 11.2 MiB │ 0 B
native │ 18.6 MiB │ 18.6 MiB │ +20.6 KiB │ 34.7 MiB │ 34.7 MiB │ 0 B
asset │ 2.2 MiB │ 2.2 MiB │ +3.6 KiB │ 3.3 MiB │ 3.3 MiB │ 0 B
other │ 1.2 MiB │ 1.2 MiB │ -32.5 KiB │ 2.7 MiB │ 2.7 MiB │ -14 KiB
──────────┼──────────┼──────────┼───────────┼──────────┼───────────┼──────────
total │ 72.5 MiB │ 74.2 MiB │ +1.6 MiB │ 133 MiB │ 131.4 MiB │ -1.7 MiB
│ raw │ unique
├────────┬────────┬────────┼────────┬────────┬───────────────────────
DEX │ old │ new │ diff │ old │ new │ diff
─────────┼────────┼────────┼────────┼────────┼────────┼───────────────────────
count │ 6 │ 6 │ 0 │ │ │
strings │ 502931 │ 451095 │ -51836 │ 350717 │ 341123 │ -9594 (+38 -9632)
types │ 137104 │ 118309 │ -18795 │ 83597 │ 82275 │ -1322 (+5 -1327)
classes │ 77570 │ 76270 │ -1300 │ 77570 │ 76270 │ -1300 (+5 -1305)
methods │ 360705 │ 326537 │ -34168 │ 299032 │ 283073 │ -15959 (+4698 -20657)
fields │ 317562 │ 287614 │ -29948 │ 250846 │ 250831 │ -15 (+3176 -3191)
Problem
Running Redex on an R8'd APK with no passes enabled will produce an APK with substantially larger DebugInfo.
Before
smali
.method public size()I
.registers 3
.line 1
iget-object v0, p0, Lcom/google/common/collect/Synchronized$SynchronizedCollection;->mutex:Ljava/lang/Object;
monitor-enter v0
.line 2
:try_start_3
move-object v1, p0
check-cast v1, Lcom/google/common/collect/Synchronized$SynchronizedQueue;
.line 3
invoke-virtual {v1}, Lcom/google/common/collect/Synchronized$SynchronizedQueue;->delegate()Ljava/util/Queue;
move-result-object v1
.line 4
invoke-interface {v1}, Ljava/util/Collection;->size()I
move-result v1
monitor-exit v0
return v1
:catchall_10
move-exception v1
.line 5
monitor-exit v0
:try_end_12
.catchall {:try_start_3 .. :try_end_12} :catchall_10
throw v1
.end method
DebugInfo
line_start=1(1) param_size=0(1) param_name=[]
SPECIAL_OPCODE(14) (addr_offset=0, line_offset=0)
SPECIAL_OPCODE(60) (addr_offset=3, line_offset=1)
SPECIAL_OPCODE(60) (addr_offset=3, line_offset=1)
SPECIAL_OPCODE(75) (addr_offset=4, line_offset=1)
SPECIAL_OPCODE(120) (addr_offset=7, line_offset=1)
END_SEQUENCE
Mapping
1:2:int size():187:188 -> size
3:3:java.util.Collection com.google.common.collect.Synchronized$SynchronizedQueue.delegate():1651:1651 -> size
3:3:int size():188 -> size
4:5:int size():188:189 -> size
After
Smali
.method public size()I
.registers 3
.line 187
iget-object v0, p0, Lcom/google/common/collect/Synchronized$SynchronizedCollection;->mutex:Ljava/lang/Object;
monitor-enter v0
.line 188
:try_start_3
move-object v1, p0
check-cast v1, Lcom/google/common/collect/Synchronized$SynchronizedQueue;
.line 1651
invoke-virtual {v1}, Lcom/google/common/collect/Synchronized$SynchronizedQueue;->delegate()Ljava/util/Queue;
move-result-object v1
.line 188
invoke-interface {v1}, Ljava/util/Collection;->size()I
move-result v1
monitor-exit v0
return v1
:catchall_10
move-exception v1
.line 189
monitor-exit v0
:try_end_12
.catchall {:try_start_3 .. :try_end_12} :catchall_10
throw v1
.end method
DebugInfo
line_start=187(2) param_size=0(1) param_name=[]
SPECIAL_OPCODE(14) (addr_offset=0, line_offset=0)
SPECIAL_OPCODE(60) (addr_offset=3, line_offset=1)
ADVANCE_LINE(1463)
SPECIAL_OPCODE(59) (addr_offset=3, line_offset=0)
ADVANCE_LINE(-1463)
SPECIAL_OPCODE(74) (addr_offset=4, line_offset=0)
SPECIAL_OPCODE(120) (addr_offset=7, line_offset=1)
END_SEQUENCE
Explanation
In the example code, the call to delegate()
has been inlined from another method. That method has a much larger original line number than the surrounding code. To reduce the number of instructions to represent the debug info, R8 normalizes line numbers to start at 1 and increase by 1. These line number mappings are then stored within the proguard map file.
When Redex reads in a mappings file and APK, it deobfuscates line numbers in apply_deobfuscated_positions
; however, when writing line numbers back out it does not do a similar line number optimization. This leads to the increased number of debug instructions as shown in the "after" DebugInfo. Specifically, there are ADVANCE_LINE
instructions to increase and then decrease the line number for the inlined code.
Workaround
By disabling apply_deobfuscated_positions
, the DebugInfo
generated by Redex is reduced by 2MB. This results in debug instructions that are more consistent with R8.
However, the DebugInfo
is still 1.1MB larger than R8's output. This difference can be resolved by running the Redex-generated DEX files through D8. I suspect this is due to R8 using a better cross-dex minification heuristic for our app. It's possible we could fine tune InterDexPass
's cross_dex_minification
parameters to produce comparable results.
My proposal to fix the initial issue would be to allow apply_deobfuscated_positions
to be toggleable through a deobfuscate_line_numbers
configuration.
Redex with apply_deobfuscated_positions
disabled
Segments in dex application (name: size / items):
- EncodedArrays: 6096 / 468
- Header: 672 / 6
- DebugInfo: 2879288 / 193771
- Fields: 2648568 / 331071
- AnnotationSetRefs: 1704 / 102
- Strings: 2033900 / 508475
- Maps: 1296 / 6
- Protos: 1093872 / 91156
- Methods: 2780864 / 347608
- Types: 569888 / 142472
- StringData: 18209662 / 508475
- ClassData: 3052861 / 75119
- TypeLists: 763278 / 65508
- AnnotationsDirectory: 1193352 / 36436
- Annotation: 1387697 / 58955
- AnnotationSets: 477576 / 58481
- ClassDefs: 2440640 / 76270
- Code: 27211190 / 260360
With apply_deobfuscated_positions
enabled, and then post-processed with D8
Segments in dex application (name: size / items):
- EncodedArrays: 5814 / 443
- Header: 672 / 6
- DebugInfo: 1929896 / 75778
- Fields: 2494824 / 311853
- AnnotationSetRefs: 1776 / 109
- Strings: 1841856 / 460464
- Maps: 1308 / 6
- Protos: 1092900 / 91075
- Methods: 2837168 / 354646
- Types: 524544 / 131136
- StringData: 16434762 / 460464
- ClassData: 3122292 / 75236
- TypeLists: 745622 / 64069
- AnnotationsDirectory: 1304416 / 40188
- Annotation: 1572466 / 66512
- AnnotationSets: 547724 / 64855
- ClassDefs: 2482240 / 77570
- Code: 28590736 / 273080
Wow, amazing investigation @benjaminRomano - thank you!! Redex has a similar optimization as R8 where it tries to reduce the ADVANCE_POSITION
instructions by remapping line numbers. I'll look for that option and sync up with you offline to see if we can try to use that.
Ahead of time I want to mention there's one caveat: the line remappings are emitted in a separate file from the output proguard mapping. We can probably create a tool to merge the debug info or teach redex to output the proguard map like R8 does.
@benjaminRomano , were you able to resolve this issue? I am noticing a similar increase in apk size with no optimization enabled.
Can you try this patch? https://github.com/benjaminRomano/redex/pull/1
You can enable by adding deobfuscate_positions: false
to your config. This will disable de-obfuscation of line numbers to ensure that R8's line number optimizations are preserved. With this, you would use you original mappings file instead of the one outputted by Redex. I can't recall whether or not this flag can be used with optimizations that affect line numbers. My recollection of this patch is a bit hazy. We didn't end up using this patch.
As for the other DEX size increase, we post-process DEX files after Redex with the following command:
# main dex list only needed if targeting SDK <=19
# If targeting SDK <=19, --debug may make your DEX size smaller
# --pg-map is needed to take advantage of R8's class distribution
r8 d8 --min-api <...> --output --pg-map <...> --main-dex-list <...> <DEX files>
I think this produces smaller files due to R8's class distribution heuristic being better than Redex's default Interdex configuration. However, I didn't dig deeply into this so that may not be the reason. The Redex maintainers may have a better idea.
@benjaminRomano , i needed some more inputs regarding how you'll use r8. I will try to connect with offline via LinkedIn to know more details. Thanks.