mkw icon indicating copy to clipboard operation
mkw copied to clipboard

Translation unit detection

Open ghost opened this issue 3 years ago • 4 comments

Most of the unresearched code currently sits in a handful of large assembly blobs. These blobs contain lots of unrelated pieces of code. We need to improve structuring.

A basic improvement is to recover the original translation unit slices and generate C inline ASM files for each TU.

The CodeWarrior build system leaks some information on TU structure. Examples:

  • Data sections of a TU (especially small data) are aligned and padded. Hint: Padding detected (i.e. no xrefs) and next piece of data is aligned
  • Strings and floating point literals are deduplicated within a TU. Hint: The TU boundary has to be between two copies of the same data.

ghost avatar Jul 24 '21 21:07 ghost

Some more clues:

  • The majority of data is not shared across TUs
  • Non-SDA data loads are typically done as first_tu_data + (data - first_tu_data). Example: .rel.text1:806DD3A8 addi r30, r30, aMashballoongc@l # "MashBalloonGC" .rel.text1:806DD3AC addi r4, r30, (aHeyhoshipgba_0 - 0x808A0420) # "HeyhoShipGBA" .rel.text1:806DD3B0 bl strcmp

riidefi avatar Jul 24 '21 21:07 riidefi

Resuming work on this. To begin with, I'm going to export all symbols, XREFs, etc, from @stblr's Ghidra using https://github.com/r0metheus/GhiDump This should get us off the ground with the sdata2 float dedup heuristic.

riptl avatar Mar 19 '22 22:03 riptl

First attempt at translation unit detection using the sdata2 heuristic has been successful (well, kinda?).

File format is

<SDATA2_START>..<SDATA2_STOP> <TEXT_START>..<TEXT_STOP>

Please note that the detected text TUs only set the minimum span. They are always greater in practice.

sdata_detect_attempt.txt

riptl avatar Mar 27 '22 09:03 riptl

Nice work! I think for the time being, we can fairly easily do .text splits using the symbol map. If the script could then autogenerate the data splits, that would be really convenient.

riidefi avatar Mar 28 '22 00:03 riidefi