ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

Decompiler does not resolve symbols for 6502 zero-page RAM addresses

Open haleaux opened this issue 1 year ago • 8 comments

Context I am reverse engineering an NES game to understand the inner workings of several game mechanics. The memory map was constructed and ROM/code loaded manually in line with the appropriate NES mapper (details irrelevant for this bug).

The 6502 RAM is separated into three regions: zero-page (0x00-0xFF), stack (0x100-0x1FF), and general-purpose RAM (0x200-0x7FF). Bank switched code exists from 0x8000 to 0xFFFF (bank switching implementation is irrelevant for this bug).

The game code regularly reads and writes byte arrays and basic structures (i.e., base + offset references) in both the zero-page and general-purpose areas of RAM. For example, a typical array write would look something like:

LDA #0x0 		; The value to be stored, in this case initialize an array location to zero
STA arrayBase, index 	; Store value in register A to arrayBase[index]

Describe the bug The decompiler does not appropriately identify and display array or structure references when the base address is in the zero-page section of RAM (address range 0x0000 – 0x00FF). Structure and array references appear to be properly resolved by the decompiler when the base address is above 0x0100 (i.e., in stack or general-purpose RAM).

To Reproduce A simple memory initialization function will be used to illustrate this bug. The function initializes five memory locations, each a 2-byte array where the first byte in the array is for player 0 and the second byte in the array is for player 1 (i.e., idxPlayer argument).

The disassembly of the initialization function is as follows: image

The 2-byte array in general-purpose RAM (address 0x059b) is defined as follows: image

One of the 2-byte arrays in zero-page RAM (address 0x0035) is defined as follows: image

The above screenshots show that there is no difference in the definitions of the arrays in general-purpose versus zero-page RAM.

The decompiler presents the following results, which for the zero-page addresses are not particularly human readable:

void b0_initSomePlayerVariables?(byte idxPlayer)
{
  somethingRelatedToPlayerState?_059b[idxPlayer] = 0;
  *(undefined *)(idxPlayer + 0x35) = 0;
  *(undefined *)(idxPlayer + 0x54) = 0;
  *(undefined *)(idxPlayer + 0x2a) = 0;
  *(undefined *)(idxPlayer + 0x52) = 0;
  return;
}

Expected behavior The decompiled code is expected to look like this, where the memory symbols and array representations are present:

void b0_initSomePlayerVariables?(byte idxPlayer)
{
  somethingRelatedToPlayerState?_059b[idxPlayer] = 0;
  playerUnknown_0035[idxPlayer] = 0;
  playerUnknown_0054[idxPlayer] = 0;
  playerAnimationType_002a[idxPlayer] = 0;
  playerUnknown_0052[idxPlayer] = 0;
  return;
}

Screenshots Screenshots were embedded in the “Reproduce” section for clarity.

Attachments The “Debug Function Decompilation” XML file has been attached (with txt extension, since xml is not allowed) in case that would be of use. DebugFunctionDecomiplationXML_20230429.txt

Environment:

  • OS: Windows 10 Version 22H2
  • Java Version: 17.0.7
  • Ghidra Version: 10.2.3
  • Ghidra Origin: official GitHub distro

Additional context I have search for other related 6502 and/or decompiler bugs but couldn’t find one that was the same as this issue. The PCode appears to be correct and essentially identical for each of the array writes, so my best guess is that this is a decompiler bug. Here is the same function with the PCode field enabled: image

haleaux avatar Apr 29 '23 17:04 haleaux

This behavior is confirmed present in the latest version of Ghidra (10.3).

Environment:

  • OS: Windows 10 Version 22H2
  • Java Version: 17.0.7
  • Ghidra Version: 10.3
  • Ghidra Origin: official GitHub distro

haleaux avatar May 13 '23 15:05 haleaux

I have issues with this as well.

npe9 avatar May 26 '23 16:05 npe9

This behavior is confirmed present in the latest version of Ghidra (10.3.2).

Environment:

  • OS: Windows 10 Version 22H2
  • Java Version: 17.0.7
  • Ghidra Version: 10.3.2
  • Ghidra Origin: official GitHub distro

haleaux avatar Jul 30 '23 20:07 haleaux

This behavior is confirmed present in the latest version of Ghidra (10.4).

Environment:

  • OS: Windows 10 Version 22H2
  • Java Version: 17.0.7
  • Ghidra Version: 10.4
  • Ghidra Origin: official GitHub distro

haleaux avatar Oct 20 '23 00:10 haleaux

I would be so happy if this got fixed. Right now a lot of the reverse engineering work I'd like to be doing on old 6502 software is just annoying enough that I can't muster the effort to do it.

Where is the relevant part of the decompiler that does this? Does it have architecture specific switches?

npe9 avatar Oct 20 '23 02:10 npe9

This behavior is confirmed present in the latest version of Ghidra (11.0).

Environment:

  • OS: Windows 10 Version 22H2
  • Java Version: 17.0.7
  • Ghidra Version: 11.0
  • Ghidra Origin: official GitHub distro

haleaux avatar Dec 26 '23 02:12 haleaux

This behavior is confirmed present in the latest version of Ghidra (11.1).

Environment:

  • OS: Windows 10 Version 22H2
  • Java Version: 17.0.7
  • Ghidra Version: 11.1
  • Ghidra Origin: official GitHub distro

haleaux avatar Jun 08 '24 15:06 haleaux