ghidra
ghidra copied to clipboard
PIC-16 Processor/Analyser not propagating RP flags
Issue When reverse engineering a PIC-16 processor binary most SFR/GPR references do not get adjusted based on the RP bits in STATUS, and do not get interpreted as address references - even if the address is correct (i.e. RP0/1 are both 0 anyway)
All code in this issue is based on the attached example PIC-16 binary.
To Reproduce Steps to reproduce the behavior:
- Import PIC-16 binary (see example attached)
- Run automatic analysis
- Look for register file 'F' instructions that do not reflect previous setting/clearing of RP0/1
Expected output
CODE:000e 83 12 BCF STATUS,RP0
CODE:000f 03 13 BCF STATUS,RP1
CODE:0010 8d 1e BTFSS DAT_DATA_000d,#0x5
With labelling:
CODE:000e 83 12 BCF STATUS,RP0
CODE:000f 03 13 BCF STATUS,RP1
CODE:0010 8d 1e BTFSS PIR2,#0x5
Actual output
CODE:000e 83 12 BCF STATUS,RP0
CODE:000f 03 13 BCF STATUS,RP1
CODE:0010 8d 1e BTFSS 0xd,#0x5
Attempted fix There are two components to this issue - the PIC-16 processor SLEIGH spec and the PIC-16 Constant Propagation analyser.
I've tried debugging the Constant Propagation analyser and found that in this code:
RegisterValue rValue = context.getRegisterValue(reg);
if (rValue == null) {
return;
}
'rValue' seems to be null on every instruction of interest (both with my modifications described below and with the original PIC-16 SLEIGH spec).
While investigating the SLEIGH spec I attempted to fix the issue by switching the RP bits into a SLEIGH context variable (see attached pic16_instructions.diff). This produced a dramatic improvement - allowing Ghidra to create references for all file register based addresses. However the propagation of RP bits breaks down in some flow situations like the following example where RP is set to 2 at CODE:0d31-2 yet at the target of the GOTO (CODE:0d34 -> CODE:0d40) the RP bits are not taken into account.
Inspecting the instruction info at CODE:0d40 shows that the rp variable is not even included as input:
Instruction Summary
-------------------
Mnemonic : BTFSS
Number of Operands: 2
Address : CODE:0d40
Flow Type : FALL_THROUGH
Fallthrough : CODE:0d41
Delay slot depth : 0
Hash : 55ba551d
Input Objects:
DATA:0057, const:0x0, const:0x1
Result Objects:
SkipNext
Constructor Line #'s:
BTFSS(pic16_instructions.sinc:551), bit(pic16_instructions.sinc:241),
srcREG(pic16_instructions.sinc:145)
Byte Length : 2
Instr Bytes : 11010111 00011100
Mask : 00000000 00111100
Masked Bytes: 00000000 00011100
Instr Context:
doPseudo(00,00) == 0x0
I attempted to solve this by specifically setting the globalcontext on the target of the GOTO with [ globalset(absAddr11,rp); ]
but this had no effect.
Expected output
assume PCLATH = 0xd
LAB_CODE_0d31 XREF[1]: CODE:0d2d(j)
CODE:0d31 83 12 BCF STATUS,RP0
assume PCLATH = <UNKNOWN>
CODE:0d32 03 17 BSF STATUS,RP1
CODE:0d33 57 19 BTFSC DAT_DATA_0157,#0x2 = ??
CODE:0d34 40 2d GOTO offset LAB_CODE_0d40
CODE:0d35 d7 1d BTFSS DAT_DATA_0157,#0x3 = ??
CODE:0d36 4c 2d GOTO offset LAB_CODE_0d4c
CODE:0d37 d7 11 BCF DAT_DATA_0157,#0x3 = ??
CODE:0d38 83 12 BCF STATUS,RP0
CODE:0d39 03 13 BCF STATUS,RP1
CODE:0d3a 13 08 MOVF SSPBUF,w = ??
CODE:0d3b 83 12 BCF STATUS,RP0
CODE:0d3c 03 17 BSF STATUS,RP1
CODE:0d3d db 00 MOVWF i2c_rx_buffer[3] = null
CODE:0d3e d6 07 ADDWF i2c_rx_checksum,f = ??
CODE:0d3f 4c 2d GOTO offset LAB_CODE_0d4c
-------------------------------------------------------------
assume PCLATH = 0xd
LAB_CODE_0d40 XREF[1]: CODE:0d34(j)
CODE:0d40 d7 1c BTFSS DAT_DATA_0157,#0x1 = ??
Actual output
assume PCLATH = 0xd
LAB_CODE_0d31 XREF[1]: CODE:0d2d(j)
CODE:0d31 83 12 BCF STATUS,RP0
assume PCLATH = <UNKNOWN>
CODE:0d32 03 17 BSF STATUS,RP1
CODE:0d33 57 19 BTFSC DAT_DATA_0157,#0x2 = ??
CODE:0d34 40 2d GOTO offset LAB_CODE_0d40
CODE:0d35 d7 1d BTFSS DAT_DATA_0157,#0x3 = ??
CODE:0d36 4c 2d GOTO offset LAB_CODE_0d4c
CODE:0d37 d7 11 BCF DAT_DATA_0157,#0x3 = ??
CODE:0d38 83 12 BCF STATUS,RP0
CODE:0d39 03 13 BCF STATUS,RP1
CODE:0d3a 13 08 MOVF SSPBUF,w = ??
CODE:0d3b 83 12 BCF STATUS,RP0
CODE:0d3c 03 17 BSF STATUS,RP1
CODE:0d3d db 00 MOVWF i2c_rx_buffer[3] = null
CODE:0d3e d6 07 ADDWF i2c_rx_checksum,f = ??
CODE:0d3f 4c 2d GOTO offset LAB_CODE_0d4c
-------------------------------------------------------------
assume PCLATH = 0xd
LAB_CODE_0d40 XREF[1]: CODE:0d34(j)
CODE:0d40 d7 1c BTFSS DAT_DATA_0057,#0x1 = ??
Attachments pic16_instructions.diff.txt PSU PIC16F886.txt
Environment:
- OS: Windows 10 Pro 21H1 Build 19043.1110
- Java Version: 11.0.8
- Ghidra Version: 10.0.1
- Ghidra Origin: Github release package
In general the context bits shouldn't be used to track state from one instruction to the next as the context and state of the register/bit gets locked into the following instruction. Normally the context is used for things like thumb-mode that directly affect the disassembly. You can't always depend on the disassembly flowing status state correctly. Registers can also be set to an initial state at the start of a function.
The main issue is setting the bits in the STATUS register when it's state is unknown isn't currently handled very well. We could potentially add a more careful tracking for certain registers, as is done in the decompiler, for certain registers like STATUS. This is probably a better solution in the long-run.
A simpler solution is to break out the RP0/1 bits into a separate fake register that can be tracked without knowing the state of the other status bits. This is essentially what you were doing anyway, only in registers not disassembly context. The C bit will also have this issue, but may not matter. Each language does it differently. Some pack the status bits in, others don't. If all the instructions that load/store the status register and bits can be known, then this is easy to do. Whenever something grabs the status register the fake register bits are packed in. Some for setting the status register. There is actually a macro to do this already in the PIC16, but it is unused. The X86 is done with separate flag registers, and the PIC16 looks like it may have done it this way at some point in the past.
A simpler solution is to break out the RP0/1 bits into a separate fake register that can be tracked without knowing the state of the other status bits.
This makes things a bit clearer, as it did seem strange that the RP bits were split out into a register.
My attempts at getting it working with the processor context were mostly experimental - but it made such a significant difference I thought I should include it.
The issue remains why the original RP register does not appear to be working, I'm going to do some more investigation stepping through the PIC16Analyzer.
I discovered there is a bug in the constant propagation looking up registers. I'm working on a fix to it. I hadn't realized the RP register was set separately as well as the STATUS register when I debugged the issue before. I had jumped to the conclusion incorrectly that the load of the RP for the memory load came from the RP0/1 bits in the STATUS memory. It may be that there is a workaround in the meantime. If I think of one, I'll let you know.
Thank you.
I got a bit stuck on a decompiler issue on FUN_CODE_092d in my example file, as it seems to be going off into invalid memory ranges (CODE:7xxx etc).
I found there was potential in some of the instructions to get an incorrect PC address as there weren't explicit bit masks in place. The attached patch adds these bit masks and at least solves the invalid memory range errors.
I'm still investigating the remaining decompiler errors: Unable to resolve constructor at CODE:1ad2. I'll leave an update as to what I find. pic16_pc_explicit_bitmasks.patch.txt
Turns out, it isn't exactly a bug. There is an assumption that any Memory Mapped register will be accessed directly by the instruction, and won't need a reference to it. If it were, every access to STATUS, PCLATH, PCL, INTCON, etc... would get a reference to that memory location.
The memory mapped registers in this case are defined by the lines:
define DATA offset=0x0000 size=1 [
INDF TMR0 PCL STATUS FSR PORTA PORTB PORTC PORTD PORTE PCLATH INTCON PIR1 PIR2 TMR1L TMR1H
];
To fix the issue, I removed certain memory mapped registers, and moved them to the .pspec file. This should fix most if not all of your accesses. There were special scrReg lines for special registers that ignore the RP bank register.
I also initialized the RP register in the .pspec file, which will assume a value of zero at the start of a function.
diff --git a/Ghidra/Processors/PIC/data/languages/pic16.pspec b/Ghidra/Processors/PIC/data/languages/pic16.pspec
index 4133f95..ea6677c 100644
--- a/Ghidra/Processors/PIC/data/languages/pic16.pspec
+++ b/Ghidra/Processors/PIC/data/languages/pic16.pspec
@@ -10,6 +10,9 @@
<tracked_set space="CODE">
<set name="SkipNext" val="0"/>
</tracked_set>
+ <tracked_set space="CODE">
+ <set name="RP" val="0"/>
+ </tracked_set>
<tracked_set space="CODE" first="0x0" last="0x1ff">
<set name="PCLATH" val="0"/>
</tracked_set>
@@ -40,6 +43,11 @@
<default_symbols>
<symbol name="Reset" address="CODE:0000" entry="true"/>
<symbol name="Interrupt" address="CODE:0004" entry="true"/>
+ <symbol name="PORTA" address="DATA:05" entry="false"/>
+ <symbol name="PORTB" address="DATA:06" entry="false"/>
+ <symbol name="PORTC" address="DATA:07" entry="false"/>
+ <symbol name="PORTD" address="DATA:08" entry="false"/>
+ <symbol name="PORTE" address="DATA:09" entry="false"/>
<symbol name="PIR1" address="DATA:0C" entry="false"/>
<symbol name="PIR2" address="DATA:0D" entry="false"/>
<symbol name="TMR1L" address="DATA:0E" entry="false"/>
diff --git a/Ghidra/Processors/PIC/data/languages/pic16.sinc b/Ghidra/Processors/PIC/data/languages/pic16.sinc
index d7144da..e24a6a3 100644
--- a/Ghidra/Processors/PIC/data/languages/pic16.sinc
+++ b/Ghidra/Processors/PIC/data/languages/pic16.sinc
@@ -80,7 +80,8 @@
#
@if PROCESSOR == "PIC_16"
define DATA offset=0x0000 size=1 [
- INDF TMR0 PCL STATUS FSR PORTA PORTB PORTC PORTD PORTE PCLATH INTCON PIR1 PIR2 TMR1L TMR1H
+# INDF TMR0 PCL STATUS FSR PORTA PORTB PORTC PORTD PORTE PCLATH INTCON PIR1 PIR2 TMR1L TMR1H
+ INDF _ PCL STATUS FSR _ _ _ _ _ PCLATH INTCON _ _ _ _
];
@elif PROCESSOR == "PIC_16F"
diff --git a/Ghidra/Processors/PIC/data/languages/pic16_instructions.sinc b/Ghidra/Processors/PIC/data/languages/pic16_instructions.sinc
index 5dda8d7..72db4a5 100644
--- a/Ghidra/Processors/PIC/data/languages/pic16_instructions.sinc
+++ b/Ghidra/Processors/PIC/data/languages/pic16_instructions.sinc
@@ -44,7 +44,8 @@
@if PROCESSOR == "PIC_16"
attach variables [ fregCore ] [
- INDF TMR0 PCL STATUS FSR PORTA PORTB PORTC PORTD PORTE PCLATH INTCON _ _ _ _
+# INDF TMR0 PCL STATUS FSR PORTA PORTB PORTC PORTD PORTE PCLATH INTCON _ _ _ _
+ _ _ PCL STATUS FSR _ _ _ _ _ PCLATH INTCON _ _ _ _
];
@elif PROCESSOR == "PIC_16F"
@@ -178,12 +179,12 @@
# File register index (f7=0): INDF use implies indirect data access using FSR value and IRP bit in STATUS reg
@if PROCESSOR == "PIC_16"
-srcREG: fregCore is f7=0 & fregCore {
+srcREG: INDF is f7=0 & INDF {
addr:2 = (zext(IRP) << 8) + zext(FSR);
export *[DATA]:1 addr;
}
-srcREG: fregCore is f7=1 & fregCore {
- rpval:2 = zext(RP == 1) + zext(RP == 2);
+srcREG: lf7 is f7=1 & lf7 {
+ rpval:2 = zext(RP == 1) + zext(RP == 3);
addr:2 = (zext(rpval) << 7) + 1;
export *[DATA]:1 addr;
}
I recently saw the mask issue with another processor. I hadn't investigated why the decompiler is sign extending the value. I suspect it may be a bug, and this is a workaround for it.
Does this affect the PIC18 also?