ghidra
ghidra copied to clipboard
Decompiler does not resolve symbols for 6502 zero-page RAM addresses
Context I am reverse engineering an NES game to understand the inner workings of several game mechanics. The memory map was constructed and ROM/code loaded manually in line with the appropriate NES mapper (details irrelevant for this bug).
The 6502 RAM is separated into three regions: zero-page (0x00-0xFF), stack (0x100-0x1FF), and general-purpose RAM (0x200-0x7FF). Bank switched code exists from 0x8000 to 0xFFFF (bank switching implementation is irrelevant for this bug).
The game code regularly reads and writes byte arrays and basic structures (i.e., base + offset references) in both the zero-page and general-purpose areas of RAM. For example, a typical array write would look something like:
LDA #0x0 ; The value to be stored, in this case initialize an array location to zero
STA arrayBase, index ; Store value in register A to arrayBase[index]
Describe the bug The decompiler does not appropriately identify and display array or structure references when the base address is in the zero-page section of RAM (address range 0x0000 – 0x00FF). Structure and array references appear to be properly resolved by the decompiler when the base address is above 0x0100 (i.e., in stack or general-purpose RAM).
To Reproduce A simple memory initialization function will be used to illustrate this bug. The function initializes five memory locations, each a 2-byte array where the first byte in the array is for player 0 and the second byte in the array is for player 1 (i.e., idxPlayer argument).
The disassembly of the initialization function is as follows:
The 2-byte array in general-purpose RAM (address 0x059b) is defined as follows:
One of the 2-byte arrays in zero-page RAM (address 0x0035) is defined as follows:
The above screenshots show that there is no difference in the definitions of the arrays in general-purpose versus zero-page RAM.
The decompiler presents the following results, which for the zero-page addresses are not particularly human readable:
void b0_initSomePlayerVariables?(byte idxPlayer)
{
somethingRelatedToPlayerState?_059b[idxPlayer] = 0;
*(undefined *)(idxPlayer + 0x35) = 0;
*(undefined *)(idxPlayer + 0x54) = 0;
*(undefined *)(idxPlayer + 0x2a) = 0;
*(undefined *)(idxPlayer + 0x52) = 0;
return;
}
Expected behavior The decompiled code is expected to look like this, where the memory symbols and array representations are present:
void b0_initSomePlayerVariables?(byte idxPlayer)
{
somethingRelatedToPlayerState?_059b[idxPlayer] = 0;
playerUnknown_0035[idxPlayer] = 0;
playerUnknown_0054[idxPlayer] = 0;
playerAnimationType_002a[idxPlayer] = 0;
playerUnknown_0052[idxPlayer] = 0;
return;
}
Screenshots Screenshots were embedded in the “Reproduce” section for clarity.
Attachments The “Debug Function Decompilation” XML file has been attached (with txt extension, since xml is not allowed) in case that would be of use. DebugFunctionDecomiplationXML_20230429.txt
Environment:
- OS: Windows 10 Version 22H2
- Java Version: 17.0.7
- Ghidra Version: 10.2.3
- Ghidra Origin: official GitHub distro
Additional context
I have search for other related 6502 and/or decompiler bugs but couldn’t find one that was the same as this issue. The PCode appears to be correct and essentially identical for each of the array writes, so my best guess is that this is a decompiler bug. Here is the same function with the PCode field enabled:
This behavior is confirmed present in the latest version of Ghidra (10.3).
Environment:
- OS: Windows 10 Version 22H2
- Java Version: 17.0.7
- Ghidra Version: 10.3
- Ghidra Origin: official GitHub distro
I have issues with this as well.
This behavior is confirmed present in the latest version of Ghidra (10.3.2).
Environment:
- OS: Windows 10 Version 22H2
- Java Version: 17.0.7
- Ghidra Version: 10.3.2
- Ghidra Origin: official GitHub distro
This behavior is confirmed present in the latest version of Ghidra (10.4).
Environment:
- OS: Windows 10 Version 22H2
- Java Version: 17.0.7
- Ghidra Version: 10.4
- Ghidra Origin: official GitHub distro
I would be so happy if this got fixed. Right now a lot of the reverse engineering work I'd like to be doing on old 6502 software is just annoying enough that I can't muster the effort to do it.
Where is the relevant part of the decompiler that does this? Does it have architecture specific switches?
This behavior is confirmed present in the latest version of Ghidra (11.0).
Environment:
- OS: Windows 10 Version 22H2
- Java Version: 17.0.7
- Ghidra Version: 11.0
- Ghidra Origin: official GitHub distro
This behavior is confirmed present in the latest version of Ghidra (11.1).
Environment:
- OS: Windows 10 Version 22H2
- Java Version: 17.0.7
- Ghidra Version: 11.1
- Ghidra Origin: official GitHub distro