redex icon indicating copy to clipboard operation
redex copied to clipboard

DebugInfo substantially larger after Redex

Open benjaminRomano opened this issue 4 years ago • 5 comments

Background DebugInfo is substantially larger after running Redex. For context in our proguard configuration file we do not explicitly keep sourceFile or lineNumbers. The only line numbers that are preserved by R8 are related to method inlining. Let me know if there is any other info I can provide.

Config

   "passes": [
     "AccessMarkingPass",
     "MethodDevirtualizationPass",
     "ReBindRefsPass",
     "BridgePass",
     "ResultPropagationPass",
     "SynthPass",
     "FinalInlinePassV2",
     "DelSuperPass",
     "UnreferencedInterfacesPass",
     "SingleImplPass",
     "CommonSubexpressionEliminationPass",
     "MethodInlinePass",
     "PeepholePass",
     "ConstantPropagationPass",
     "LocalDcePass",
     "AnnoKillPass",
     "ReorderInterfacesPass",
     "RemoveEmptyClassesPass",
     "SingleImplPass",
     "InterDexPass",
     "CommonSubexpressionEliminationPass",
     "RemoveGotosPass",
     "DedupBlocksPass",
     "RemoveRedundantCheckCastsPass",
     "UpCodeMotionPass",
     "RegAllocPass",
     "MakePublicPass",
     "CopyPropagationPass",
     "LocalDcePass",
     "DedupBlocksPass",
     "ReduceGotosPass"
   ]

Stats

Baseline:

Segments in dex application (name: size / items):
 - EncodedArrays: 5728 / 441
 - Header: 672 / 6
 - DebugInfo: 1746499 / 74076
 - Fields: 2525928 / 315741
 - AnnotationSetRefs: 1768 / 109
 - Strings: 1896120 / 474030
 - Maps: 1296 / 6
 - Protos: 1099932 / 91661
 - Methods: 2808992 / 351124
 - Types: 541136 / 135284
 - StringData: 16857063 / 474030
 - ClassData: 3058295 / 74116
 - TypeLists: 755694 / 64783
 - AnnotationsDirectory: 1287424 / 39606
 - Annotation: 1578028 / 66547
 - AnnotationSets: 547820 / 64919
 - ClassDefs: 2446080 / 76440
 - Code: 27480789 / 265780

Redex:

Segments in dex application (name: size / items):
 - EncodedArrays: 6064 / 465
 - Header: 672 / 6
 - DebugInfo: 4801880 / 193772
 - Fields: 2300912 / 287614
 - AnnotationSetRefs: 1596 / 96
 - Strings: 1804380 / 451095
 - Maps: 1296 / 6
 - Protos: 982704 / 81892
 - Methods: 2612296 / 326537
 - Types: 473236 / 118309
 - StringData: 16085190 / 451095
 - ClassData: 3044484 / 75119
 - TypeLists: 686458 / 58662
 - AnnotationsDirectory: 1187784 / 36088
 - Annotation: 1376034 / 58818
 - AnnotationSets: 476420 / 58343
 - ClassDefs: 2440640 / 76270
 - Code: 27208538 / 260361

diff with Diffuse:

          │           compressed            │          uncompressed
          ├──────────┬──────────┬───────────┼──────────┬───────────┬──────────
 APK      │ old      │ new      │ diff      │ old      │ new       │ diff
──────────┼──────────┼──────────┼───────────┼──────────┼───────────┼──────────
      dex │   24 MiB │ 25.6 MiB │  +1.7 MiB │ 64.1 MiB │  62.5 MiB │ -1.6 MiB
     arsc │   17 MiB │   17 MiB │     -10 B │   17 MiB │    17 MiB │      0 B
 manifest │ 10.6 KiB │ 10.8 KiB │    +145 B │ 53.3 KiB │  53.3 KiB │      0 B
      res │  9.6 MiB │  9.5 MiB │ -33.6 KiB │ 11.2 MiB │  11.2 MiB │      0 B
   native │ 18.6 MiB │ 18.6 MiB │ +20.6 KiB │ 34.7 MiB │  34.7 MiB │      0 B
    asset │  2.2 MiB │  2.2 MiB │  +3.6 KiB │  3.3 MiB │   3.3 MiB │      0 B
    other │  1.2 MiB │  1.2 MiB │ -32.5 KiB │  2.7 MiB │   2.7 MiB │  -14 KiB
──────────┼──────────┼──────────┼───────────┼──────────┼───────────┼──────────
    total │ 72.5 MiB │ 74.2 MiB │  +1.6 MiB │  133 MiB │ 131.4 MiB │ -1.7 MiB


         │           raw            │                 unique
         ├────────┬────────┬────────┼────────┬────────┬───────────────────────
 DEX     │ old    │ new    │ diff   │ old    │ new    │ diff
─────────┼────────┼────────┼────────┼────────┼────────┼───────────────────────
   count │      6 │      6 │      0 │        │        │
 strings │ 502931 │ 451095 │ -51836 │ 350717 │ 341123 │  -9594 (+38 -9632)
   types │ 137104 │ 118309 │ -18795 │  83597 │  82275 │  -1322 (+5 -1327)
 classes │  77570 │  76270 │  -1300 │  77570 │  76270 │  -1300 (+5 -1305)
 methods │ 360705 │ 326537 │ -34168 │ 299032 │ 283073 │ -15959 (+4698 -20657)
  fields │ 317562 │ 287614 │ -29948 │ 250846 │ 250831 │    -15 (+3176 -3191)

benjaminRomano avatar Aug 05 '20 00:08 benjaminRomano

Problem

Running Redex on an R8'd APK with no passes enabled will produce an APK with substantially larger DebugInfo.

Before

smali

.method public size()I
    .registers 3

    .line 1
    iget-object v0, p0, Lcom/google/common/collect/Synchronized$SynchronizedCollection;->mutex:Ljava/lang/Object;

    monitor-enter v0

    .line 2
    :try_start_3
    move-object v1, p0

    check-cast v1, Lcom/google/common/collect/Synchronized$SynchronizedQueue;

    .line 3
    invoke-virtual {v1}, Lcom/google/common/collect/Synchronized$SynchronizedQueue;->delegate()Ljava/util/Queue;

    move-result-object v1

    .line 4
    invoke-interface {v1}, Ljava/util/Collection;->size()I

    move-result v1

    monitor-exit v0

    return v1

    :catchall_10
    move-exception v1

    .line 5
    monitor-exit v0
    :try_end_12
    .catchall {:try_start_3 .. :try_end_12} :catchall_10

    throw v1
.end method

DebugInfo

    line_start=1(1) param_size=0(1) param_name=[]
    SPECIAL_OPCODE(14) (addr_offset=0, line_offset=0)
    SPECIAL_OPCODE(60) (addr_offset=3, line_offset=1)
    SPECIAL_OPCODE(60) (addr_offset=3, line_offset=1)
    SPECIAL_OPCODE(75) (addr_offset=4, line_offset=1)
    SPECIAL_OPCODE(120) (addr_offset=7, line_offset=1)
    END_SEQUENCE

Mapping

    1:2:int size():187:188 -> size
    3:3:java.util.Collection com.google.common.collect.Synchronized$SynchronizedQueue.delegate():1651:1651 -> size
    3:3:int size():188 -> size
    4:5:int size():188:189 -> size

After

Smali

.method public size()I
    .registers 3

    .line 187
    iget-object v0, p0, Lcom/google/common/collect/Synchronized$SynchronizedCollection;->mutex:Ljava/lang/Object;

    monitor-enter v0

    .line 188
    :try_start_3
    move-object v1, p0

    check-cast v1, Lcom/google/common/collect/Synchronized$SynchronizedQueue;

    .line 1651
    invoke-virtual {v1}, Lcom/google/common/collect/Synchronized$SynchronizedQueue;->delegate()Ljava/util/Queue;

    move-result-object v1

    .line 188
    invoke-interface {v1}, Ljava/util/Collection;->size()I

    move-result v1

    monitor-exit v0

    return v1

    :catchall_10
    move-exception v1

    .line 189
    monitor-exit v0
    :try_end_12
    .catchall {:try_start_3 .. :try_end_12} :catchall_10

    throw v1
.end method

DebugInfo

    line_start=187(2) param_size=0(1) param_name=[]
    SPECIAL_OPCODE(14) (addr_offset=0, line_offset=0)
    SPECIAL_OPCODE(60) (addr_offset=3, line_offset=1)
    ADVANCE_LINE(1463)
    SPECIAL_OPCODE(59) (addr_offset=3, line_offset=0)
    ADVANCE_LINE(-1463)
    SPECIAL_OPCODE(74) (addr_offset=4, line_offset=0)
    SPECIAL_OPCODE(120) (addr_offset=7, line_offset=1)
    END_SEQUENCE

Explanation

In the example code, the call to delegate() has been inlined from another method. That method has a much larger original line number than the surrounding code. To reduce the number of instructions to represent the debug info, R8 normalizes line numbers to start at 1 and increase by 1. These line number mappings are then stored within the proguard map file.

When Redex reads in a mappings file and APK, it deobfuscates line numbers in apply_deobfuscated_positions; however, when writing line numbers back out it does not do a similar line number optimization. This leads to the increased number of debug instructions as shown in the "after" DebugInfo. Specifically, there are ADVANCE_LINE instructions to increase and then decrease the line number for the inlined code.

Workaround

By disabling apply_deobfuscated_positions, the DebugInfo generated by Redex is reduced by 2MB. This results in debug instructions that are more consistent with R8.

However, the DebugInfo is still 1.1MB larger than R8's output. This difference can be resolved by running the Redex-generated DEX files through D8. I suspect this is due to R8 using a better cross-dex minification heuristic for our app. It's possible we could fine tune InterDexPass's cross_dex_minification parameters to produce comparable results.

My proposal to fix the initial issue would be to allow apply_deobfuscated_positions to be toggleable through a deobfuscate_line_numbers configuration.

Redex with apply_deobfuscated_positions disabled

Segments in dex application (name: size / items):
 - EncodedArrays: 6096 / 468
 - Header: 672 / 6
 - DebugInfo: 2879288 / 193771
 - Fields: 2648568 / 331071
 - AnnotationSetRefs: 1704 / 102
 - Strings: 2033900 / 508475
 - Maps: 1296 / 6
 - Protos: 1093872 / 91156
 - Methods: 2780864 / 347608
 - Types: 569888 / 142472
 - StringData: 18209662 / 508475
 - ClassData: 3052861 / 75119
 - TypeLists: 763278 / 65508
 - AnnotationsDirectory: 1193352 / 36436
 - Annotation: 1387697 / 58955
 - AnnotationSets: 477576 / 58481
 - ClassDefs: 2440640 / 76270
 - Code: 27211190 / 260360

With apply_deobfuscated_positions enabled, and then post-processed with D8

Segments in dex application (name: size / items):
 - EncodedArrays: 5814 / 443
 - Header: 672 / 6
 - DebugInfo: 1929896 / 75778
 - Fields: 2494824 / 311853
 - AnnotationSetRefs: 1776 / 109
 - Strings: 1841856 / 460464
 - Maps: 1308 / 6
 - Protos: 1092900 / 91075
 - Methods: 2837168 / 354646
 - Types: 524544 / 131136
 - StringData: 16434762 / 460464
 - ClassData: 3122292 / 75236
 - TypeLists: 745622 / 64069
 - AnnotationsDirectory: 1304416 / 40188
 - Annotation: 1572466 / 66512
 - AnnotationSets: 547724 / 64855
 - ClassDefs: 2482240 / 77570
 - Code: 28590736 / 273080

benjaminRomano avatar Aug 18 '20 07:08 benjaminRomano

Wow, amazing investigation @benjaminRomano - thank you!! Redex has a similar optimization as R8 where it tries to reduce the ADVANCE_POSITION instructions by remapping line numbers. I'll look for that option and sync up with you offline to see if we can try to use that.

Ahead of time I want to mention there's one caveat: the line remappings are emitted in a separate file from the output proguard mapping. We can probably create a tool to merge the debug info or teach redex to output the proguard map like R8 does.

danzimm avatar Aug 19 '20 15:08 danzimm

@benjaminRomano , were you able to resolve this issue? I am noticing a similar increase in apk size with no optimization enabled.

kamalgsharma avatar Sep 15 '21 04:09 kamalgsharma

Can you try this patch? https://github.com/benjaminRomano/redex/pull/1

You can enable by adding deobfuscate_positions: false to your config. This will disable de-obfuscation of line numbers to ensure that R8's line number optimizations are preserved. With this, you would use you original mappings file instead of the one outputted by Redex. I can't recall whether or not this flag can be used with optimizations that affect line numbers. My recollection of this patch is a bit hazy. We didn't end up using this patch.

As for the other DEX size increase, we post-process DEX files after Redex with the following command:

# main dex list only needed if targeting SDK <=19 
# If targeting SDK <=19, --debug may make your DEX size smaller
# --pg-map is needed to take advantage of R8's class distribution
r8 d8 --min-api <...> --output --pg-map <...> --main-dex-list <...> <DEX files>

I think this produces smaller files due to R8's class distribution heuristic being better than Redex's default Interdex configuration. However, I didn't dig deeply into this so that may not be the reason. The Redex maintainers may have a better idea.

benjaminRomano avatar Sep 20 '21 19:09 benjaminRomano

@benjaminRomano , i needed some more inputs regarding how you'll use r8. I will try to connect with offline via LinkedIn to know more details. Thanks.

kamalgsharma avatar Sep 21 '21 22:09 kamalgsharma