jadx icon indicating copy to clipboard operation
jadx copied to clipboard

mapping-io import support

Open NebelNidas opened this issue 2 years ago • 13 comments

The follow-up to #1505, so you can now import your previously exported mappings. Partly addresses #1531.

Currently working:

  • [x] Passing mapping file location via command line argument
  • [x] Adding last opened mapping file to the project data
  • [x] Actually applying the mappings:
    • [x] Classes
    • [x] Fields
    • [x] Methods
    • [x] Method args
    • ~~[ ] Method vars~~ not feasible

NebelNidas avatar Jun 13 '22 00:06 NebelNidas

This pull request introduces 1 alert when merging ce2b021e2a405bceb276fca255b1474336ce2bbe into 1533b7fe6e0c1e21097a84b1e69ff2ed499c6c8b - view on LGTM.com

new alerts:

  • 1 for Dereferenced variable may be null

lgtm-com[bot] avatar Jun 14 '22 14:06 lgtm-com[bot]

@skylot Is there any way to determine during the decompilation's rename phase whether a local variable is present in bytecode or generated by the decompiler? And are the SSAVars' reg nums different to the VarNode ones?

I can't use the collectMethodVars trick from the exporter here, since VarNodes are only available after decompilation has finished

NebelNidas avatar Jun 25 '22 14:06 NebelNidas

Is there any way to determine during the decompilation's rename phase whether a local variable is present in bytecode or generated by the decompiler?

I don't think it is possible. In short: bytecode only have registers next jadx split these registers into SSA variables (add version number for each register) and next in ProcessVariables merge variables which used in PHI and have same types (sets of SSA vars that can be merged available in CodeVar). Also, some SSA vars not generated because expressions can be inlined. You can apply name for all SSA vars or CodeVar only by register number, because you don't have any additional info :(

And are the SSAVars' reg nums different to the VarNode ones?

Yes, those numbers are same

skylot avatar Jun 25 '22 20:06 skylot

This pull request introduces 2 alerts when merging ee37c96936da04fe7004d1206b8b6678384d9b17 into ab4b6f9e540a23004156393ad249479cde9d03a0 - view on LGTM.com

new alerts:

  • 2 for Dereferenced variable may be null

lgtm-com[bot] avatar Jul 26 '22 06:07 lgtm-com[bot]

The PR is almost ready now! A few things I wanted to note:

  • As you suggested earlier, I added a mappingsPath property to the JADX project file, so your last opened mappings get automatically loaded on next startup. Clicking the Close mappings menu option in jadx-gui removes the entry from your project file.
  • In addition to mappingsPath, I added a mappingsLastModified property. This is to verify that the mappings didn't change on disk since you had them last open; and if they did, JADX's cache gets invalidated. I wanted to use a hash for that purpose initially, but since you can save mappings in a folder structure which can have thousands of files, I went for the filesystem's last modified date instead.
  • I deprecated the CLI args regarding the classic .jobf deobfuscation files, as your plan was to move them into the cache folder at a later date anyway.
  • Terminology: Internally, I renamed all DeobfuscationMapFile related things to GeneratedRenamesMappingFile, so you immediately know these are the auto-generated renames. User-provided renames are named UserRenamesMappings. No File appended to the name, since mappings can be loaded from directories. If you aren't happy with the new names, please provide an alternative! But we should definitely remove the ambiguity of auto-generated renames and user-provided renames in other places of the project, too, like all the other --deobf- CLI args. I'd rename them all to --auto-deobf-, so it's clear these args only regard automatic name generation.
  • Method var importing isn't implemented as of now, because that's near to impossible with JADX's SSAVar structure and the impossibility of accessing code generated before the decompiler added its own, not-present-in-bytecode vars (which mess up the determination of lv-indices, lvt-indices and offsets).
  • There's some rather ugly fallback-related code here. I wasn't able to clean it up any further, but I at least added some comments.
  • Importing mappings is partly handled in jadx-core. Yes, I know what you said about keeping jadx-core clean from non-decompilation related functionality. There were several reasons why I did it anyway:
    • If I wanted to use JADX' decompiler as a library, which I do, and had mappings I wanted JADX to use for that process, there's currently no other way than to have the code in jadx-core directly.
    • All the other rename-related stuff is also in jadx-core. Be it auto-deobfuscation or user renames and their visitors, they're now all in one place. Personally I think this will make a future extraction into plugins easier not harder, as you have everything next to each other rather than split into multiple projects.
    • Once JADX's plugin system becomes powerful enough, I promise you to help with the process of moving the mapping-related stuff out, so any additional work caused by this PR can be kept at a minimum.

NebelNidas avatar Jul 26 '22 08:07 NebelNidas

@NebelNidas thanks, looks fine. Some thoughts:

  1. You don't need additional code to watch for mapping files changes on load, you can merge your file with inputs list here: https://github.com/skylot/jadx/blob/f4b36454351c44b881ead0ac0c3234e9e10fe59d/jadx-gui/src/main/java/jadx/gui/utils/codecache/disk/DiskCodeCache.java#L196 I'm also thinking to make mappings files part of inputs files, so I have several questions: Does it make sense to load several mappings files? And is it possible to filter out mapping files by file extension?
  2. For easier to migrate your changes into separate plugin I am working on plugins API for visitors in jadx-script branch (still an early draft), also I'm preparing changes for simplify deobfuscator and renames (check wrapper and script for custom deobfuscator :rofl:). So considering most of your code already in visitors, this migration shouldn't be hard.

skylot avatar Jul 26 '22 12:07 skylot

You don't need additional code to watch for mapping files changes on load, you can merge your file with inputs list here: [...]

Thanks, that's a good suggestion! Cleans up the code a little.

Does it make sense to load several mappings files?

I guess you could layer them, but my PR doesn't support that and I don't really think anyone would use it. Enigma doesn't either afaik, and afaik no one has requested it in all these years. If someone does indeed open a feature request some day, I might look into it, but for now it's not really worth it.

And is it possible to filter out mapping files by file extension?

You can make a loop over MappingFormat's entries and check against their individual file extensions. You can also look at MappingReader.detectFormat(), but that's only useful if you're fairly certain already that you have a mappings file and just want to check for the format. For mapping directories it may be a bit harder to determine, although I suppose you can just check if it contains .mapping files - however deeply they might be nested in there.

For easier to migrate your changes into separate plugin I am working on plugins API for visitors

Perfect! :) Feel free to ping me once it's in a usable state and I can start migrating stuff 👍

NebelNidas avatar Jul 26 '22 13:07 NebelNidas

Btw, if you need a real-world example of JADX being used with mappings, please see my repo here: https://github.com/NebelNidas/yt-mappings

All you have to do to get it running is:

  1. Clone the repo
  2. Download the YouTube 17.14.35 APK from here, rename it to youtube.apk and place it inside the cloned repo's folder
  3. Run /gradlew jadx

NebelNidas avatar Jul 26 '22 13:07 NebelNidas

@skylot Can you look at why the test is failing? It works fine on my machine 🤔

NebelNidas avatar Jul 28 '22 08:07 NebelNidas

@NebelNidas This test executed in headless mode (without UI) but you are trying to use UI objects (i.e. new MainWindow instance), so you get a HeadlessException. Looks like you don't need these added constructor arguments (JadxProject and JadxSettings) at all, because these settings already added into JadxArgs. So just remove these new arguments and this will resolve exception.

skylot avatar Jul 28 '22 13:07 skylot

Also, I think I will migrate your change to plugin by myself, so I will take your change once it ready and apply needed refactoring.

skylot avatar Jul 28 '22 13:07 skylot

Looks like you don't need these added constructor arguments (JadxProject and JadxSettings) at all, because these settings already added into JadxArgs

Not really the case though. JadxArgs only contains the initial CLI start arguments, not necessarily the values that are used at a later point. JadxProject's mappings path may get changed later on, JadxArgs's stays the same: https://github.com/skylot/jadx/blob/c5ed9e21f9f76c206bacffed5fb4c9304d9b07fa/jadx-gui/src/main/java/jadx/gui/ui/MainWindow.java#L428-L431 Or JadxArgs' value is null (no launch arg set), but there is a mappings path stored in the current project's project file which only gets loaded into JadxProject. Same with JadxSettings: https://github.com/skylot/jadx/blob/c5ed9e21f9f76c206bacffed5fb4c9304d9b07fa/jadx-gui/src/main/java/jadx/gui/ui/MainWindow.java#L422-L425

Now, I could update the values stored in JadxArgs each time as well, but does that make sense? Why are JadxArgs and JadxSettings even split into two different files? I understand that one is jadx-core and one is jadx-gui, but why doesn't one extend the other?

NebelNidas avatar Jul 28 '22 16:07 NebelNidas

Well, I understand your confusion, here are some details:

  1. JadxArgs - storage for all jadx decompiler options (part of jadx API)
  2. JadxCLIArgs - definition and parsing of jadx-cli args, have convert method to get JadxArgs. Yes, some fields duplicate JadxArgs fields, but some can have different types, like input files are strings and need to be converted to File in JadxArgs.
  3. JadxSettings - jadx-gui global settings, extends JadxCLIArgs to allow override from cli, can be saved as json in java preferences.
  4. JadxProject - project specific settings, saved as json file, set/override global options in JadxArgs.

So JadxArgs purpose is to store all settings, other classes just wrappers to collect/store these settings. For jadx-gui JadxArgs options set and updated in JadxWrapper, you can copy/update settings from project if needed.

As for DiskCodeCache class, I don't want to tie it to jadx-gui, because it can be used independently, and I may move it to core or separate module later.

Also, you mention options update after initial loading: actually it is a big issue, because incremental code update is very error-prone, if you know what class code is changed you can use ClassNode.unloadCode() to remove it from cache and force decompilation on next load or ClassNode.reloadCode() to get updated code right now. But for now, I prefer full reload, as it is much easier to support and cause fewer errors, and anyway it should be implemented for correct first load. So the best approach is to put correct options into JadxArgs in JadxWrapper, and if these options updated, just call reload.

skylot avatar Jul 28 '22 17:07 skylot

@NebelNidas I merged this PR into jadx-next branch, I will refactor your changes into separate module/plugin. I plan to merge that branch after next jadx release (August 20). Thanks again for your contribution :+1:

skylot avatar Aug 10 '22 14:08 skylot