Ghidra incorrectly analyzes Borland C++ generated switches
Describe the bug
Borland C++ generates switches in form of JMP word ptr CS:[BX + CaseTable],
where CaseTable is placed at the end of a function, containing the switch, and holds CS relative offsets (1 for each case) inside the function. Ghidra ignores that table and generates completely unrelated references far away from the function.
To Reproduce Steps to reproduce the behavior:
- Analyze the attached st.exe with the default settings.
- Press G and enter 4b0a:09e5
Other switches there follow the same scheme and are incorrectly analyzed too.
Expected behavior Ghidra uses the table properly to generate navigate around the switch.
Attachments st.zip
Environment (please complete the following information):
- OS: Windows 11 23H2
- Java Version: java 17.0.7 2023-04-18 LTS
- Ghidra Version: 11.1.1
- Ghidra Origin: https://github.com/NationalSecurityAgency/ghidra/releases/tag/Ghidra_11.1.1_build
Additional context The work around is clearing existing references by the jump's address, and then manually adding a set of COMPUTED_JUMP references. The one runs SwitchOverride.java at the jump's address.
It would be nice if Ghidra did some sanity checks, that the jump target lies inside the function or anywhere near it. And if anything is suspicious, just asking user for further input, mentioning that SwitchOverride.java script.
This bug occurs because the address of the jump table is not determined correctly.
This happens because there is a mismatch between how the segment pcode op is implemented, and the pcode for the JMP instruction. The segment(a, b) pcode op returns the address (a << 4) + b. However, the pcode for the JMP word ptr CS:[BX + 0xb69] instruction is:
4b0a:09e5 2e ff a7 69 0b JMP word ptr CS:[BX + 0xb69]
$U3e00:4 = INT_RIGHT 0x4ba8a:4, 4:4
$U3f00:4 = INT_AND $U3e00:4, 0xf000:4
CS = SUBPIECE $U3f00:4, 0:4
$U2e80:2 = INT_ADD BX, 0xb69:2
$U3e00:4 = INT_RIGHT 0x4ba8a:4, 4:4
$U3f00:4 = INT_AND $U3e00:4, 0xf000:4
CS = SUBPIECE $U3f00:4, 0:4
$U4a80:4 = CALLOTHER "segment", CS, $U2e80:2
$U9200:2 = LOAD ram($U4a80:4)
$U29a80:4 = CALLOTHER "segment", CS, $U9200:2
BRANCHIND $U29a80:4
We would expect CS to have the value 0x4b0a, but here, it's updated (twice?!). The root cause seems to be the definition of currentCS in ia.sinc. This macro uses the 32-bit address of the next instruction to calculate the current value of CS. Since inst_next == (CS << 4) + offset, this macro sets CS to (inst_next >> 4) & 0xF000 == (((CS << 4) + offset) >> 4) & 0xF000. However, without knowing offset, it's impossible to determine what the real value of CS is. However, CS is set to the correct value at the start of the function. As such, it seems best to not change it, and have the currentCS macro just export the CS varnode.
https://github.com/NationalSecurityAgency/ghidra/blob/a1db2dac166973a381e7a98630bc11901f47d2d2/Ghidra/Processors/x86/data/languages/ia.sinc#L1049-L1050
The only other location the CS varnode is changed, is in the ptr1616 and ptr1632 macros. I don't really understand how real-mode 16-bit x86 works, so I assumed these updates to CS are necessary. These macros are used for the CALLF and JMPF instruction. In the CALLF instruction, the control flow will return from the call, so we should restore the CS value afterwards. The JMPF will not return, and as such, we don't need to restore the CS value.
https://github.com/NationalSecurityAgency/ghidra/blob/a1db2dac166973a381e7a98630bc11901f47d2d2/Ghidra/Processors/x86/data/languages/ia.sinc#L1379-L1381
As said before, I barely have any experience with real-mode 16-bit x86, so I'm not sure how well these changes actually model reality, or if it's just an ugly hack that happens to work decently well for this one sample. Anyway, I pasted the patch I applied below.
diff --git a/Ghidra/Processors/x86/data/languages/ia.sinc b/Ghidra/Processors/x86/data/languages/ia.sinc
index dd8c841131..1bf5513a0c 100644
--- a/Ghidra/Processors/x86/data/languages/ia.sinc
+++ b/Ghidra/Processors/x86/data/languages/ia.sinc
@@ -1046,8 +1046,8 @@ addr64: [Base64 + Index64*ss + simm32_64] is mod=2 & r_m=4; Index64 & Base64 & s
addr64: [Base64 + Index64*ss] is mod=2 & r_m=4; Index64 & Base64 & ss; imm32=0 { local tmp=Base64+Index64*ss; export tmp; }
@endif
-currentCS: CS is protectedMode=0 & CS { tmp:4 = (inst_next >> 4) & 0xf000; CS = tmp:2; export CS; }
-currentCS: CS is protectedMode=1 & CS { tmp:4 = (inst_next >> 16) & 0xffff; CS = tmp:2; export CS; }
+currentCS: CS is protectedMode=0 & CS { export CS; }
+currentCS: CS is protectedMode=1 & CS { export CS; }
segWide: is segover=0 { export 0:$(SIZE); }
segWide: CS: is segover=1 & CS { export 0:$(SIZE); }
@@ -2526,10 +2526,10 @@ with : lockprefx=0 {
@endif
# direct far calls generate an opcode undefined exception in x86-64
-:CALLF ptr1616 is vexMode=0 & addrsize=0 & opsize=0 & byte=0x9a; ptr1616 { push22(CS); build ptr1616; push22(&:2 inst_next); call ptr1616; }
-:CALLF ptr1616 is vexMode=0 & addrsize=1 & opsize=0 & byte=0x9a; ptr1616 { push42(CS); build ptr1616; push42(&:2 inst_next); call ptr1616; }
-:CALLF ptr1632 is vexMode=0 & addrsize=0 & opsize=1 & byte=0x9a; ptr1632 { push22(CS); build ptr1632; push24(&:4 inst_next); call ptr1632; }
-:CALLF ptr1632 is vexMode=0 & addrsize=1 & opsize=1 & byte=0x9a; ptr1632 { pushseg44(CS); build ptr1632; push44(&:4 inst_next); call ptr1632; }
+:CALLF ptr1616 is vexMode=0 & addrsize=0 & opsize=0 & byte=0x9a; ptr1616 { tmp:2 = CS; push22(CS); build ptr1616; push22(&:2 inst_next); call ptr1616; CS = tmp; }
+:CALLF ptr1616 is vexMode=0 & addrsize=1 & opsize=0 & byte=0x9a; ptr1616 { tmp:2 = CS; push42(CS); build ptr1616; push42(&:2 inst_next); call ptr1616; CS = tmp; }
+:CALLF ptr1632 is vexMode=0 & addrsize=0 & opsize=1 & byte=0x9a; ptr1632 { tmp:2 = CS; push22(CS); build ptr1632; push24(&:4 inst_next); call ptr1632; CS = tmp; }
+:CALLF ptr1632 is vexMode=0 & addrsize=1 & opsize=1 & byte=0x9a; ptr1632 { tmp:2 = CS; pushseg44(CS); build ptr1632; push44(&:4 inst_next); call ptr1632; CS = tmp; }
:CALLF addr16 is vexMode=0 & addrsize=0 & opsize=0 & byte=0xff; addr16 & reg_opcode=3 ... { local ptr:$(SIZE) = segment(DS,addr16); local addrptr:$(SIZE) = segment(*:2 (ptr+2),*:2 ptr);
push22(CS); push22(&:2 inst_next); call [addrptr]; }
:CALLF addr32 is vexMode=0 & addrsize=1 & opsize=0 & byte=0xff; addr32 & reg_opcode=3 ... { local dest:4 = addr32; push42(CS); push42(&:2 inst_next); call [dest]; }
With this patch applied, the pcode for the JMP instruction looks much more like I expected.
4b0a:09e5 2e ff a7 69 0b JMP word ptr CS:[BX + 0xb69]
$U2e80:2 = INT_ADD BX, 0xb69:2
$U4680:4 = CALLOTHER "segment", CS, $U2e80:2
$U8e00:2 = LOAD ram($U4680:4)
$U29880:4 = CALLOTHER "segment", CS, $U8e00:2
BRANCHIND $U29880:4
Additionally, Ghidra is now able to correctly identify the jumptable, and it also correctly identifies the code for the switch cases. As such, the listing view also looks better:
And the decompiler is also able to correctly recover the switch cases. The decompiled code looks correct at a first glance.
Yeah! looks much better than the SwitchOverride.java Many thanks! But there is a strange thing... that function with the switch (see below) saves a graphics file, switching on its type (1st byte in the stg_t structure). So I don't really get why it now rshifts param_1 by 0x10, since for the ES:BX far_t pointer such shift returns the segment.
The late 16bit x86 was all about using ES:BX and DS:DI as pointers to data structures or arrays, while using DX:AX and CX:BX to store 32bit integers, and bank-switching the EMS/XMS into the 640k (i.e. it was about processing inherently 32bit data with 16bits). For that st.exe, the Borland C++ compiler assigned each *.C file a separate CS segment, so calls inside a C file were near and calls into another .C were far, and therefore it was possible to segment-swap/overlay each C file (overlay manager actually walked the stack frames, patching returns), freeing large portion of memory for data. Additionally, the entire C library was usually combined into a single segment. Otherwise it is similar to 32bit x86.
stg_t * stshStoreStg(stg_t *gfx)
{
uint uVar1;
undefined2 uVar2;
uint16_t ofs_lo;
cte_t *psVar4;
bool bVar4;
far_t fVar5;
far_t ptr_base;
cte_t *pcVar6;
ulong b;
DW sz_00;
DW sz;
if (5 < gfx->type) {
/* WARNING: Subroutine does not return */
stExit(0x18);
}
/* WARNING: Switch is manually overridden */
switch((undefined4)*(undefined2 *)((uint)gfx->type * 2 + 0xb69)) {
case 0x4ba94:
return gfx;
case 0x4ba9d:
sz.lo = *(uint *)&((stg_t *)gfx)->sz + 9;
sz.hi = *(int *)((int)&((stg_t *)gfx)->sz + 2) + (uint)(0xfff6 < *(uint *)&((stg_t *)gfx)->sz);
break;
case 0x4bab6:
uVar1 = *(uint *)((int)&((stg_t *)gfx)->w + 1);
sz.lo = uVar1 + 0xe;
sz.hi = *(int *)((int)&((stg_t *)gfx)->h + 1) + (uint)(0xfff1 < uVar1);
break;
case 0x4bacf:
sz.lo = ((stg_t *)gfx)->w + 0xd;
sz.hi = 0;
break;
case 0x4bae3:
uVar1 = *(uint *)((int)&((stg_t *)gfx)[1].sz + 2);
sz.lo = uVar1 + 0x14;
sz.hi = ((stg_t *)gfx)[1].w + (uint)(0xffeb < uVar1);
}
if (StshCount < 0xd93) {
b = (ulong)StshCount;
StshCount = StshCount + 1;
fVar5 = (far_t)LXMUL@(0x15,b);
ptr_base.seg = StshCtPtr.seg;
ptr_base.ofs = StshCtPtr.ofs;
fVar5 = PADD@(ptr_base,fVar5);
pcVar6 = (cte_t *)mkp(fVar5.ofs,fVar5.seg);
psVar4 = (cte_t *)pcVar6;
uVar2 = (undefined2)((ulong)pcVar6 >> 0x10);
(psVar4->sz).hi = sz.hi;
(&psVar4->sz)->lo = sz.lo;
pcVar6->state = 0x0;
ofs_lo = StshCtOfs.lo;
(psVar4->ofs).hi = StshCtOfs.hi;
(&psVar4->ofs)->lo = ofs_lo;
*(undefined2 *)(psVar4->info + 10) = 0;
*(undefined2 *)(psVar4->info + 8) = 0;
bVar4 = CARRY2(StshCtOfs.lo,sz.lo);
StshCtOfs.lo = StshCtOfs.lo + sz.lo;
StshCtOfs.hi = StshCtOfs.hi + sz.hi + (uint)bVar4;
*(undefined2 *)(psVar4->info + 2) = 0;
*(undefined2 *)psVar4->info = 0;
*(undefined2 *)(psVar4->info + 6) = 0;
*(undefined2 *)(psVar4->info + 4) = 0;
if (DebugVerbose != '\0') {
printf((char *)"type %d len %ld \r",(uint)gfx->type);
biosReadKey();
}
sz_00.hi = sz.hi;
sz_00.lo = sz.lo;
stshWriteFile(StshFile,(far_t)gfx,sz_00);
stshFree((far_t)gfx);
/* return the contents table entry pointer */
return (stg_t *)pcVar6;
}
/* WARNING: Subroutine does not return */
stExit(0x24);
}
@LukeSerne You are my hero for this patch. I was beating my head into a wall trying to figure out why it wasn't analyzing the address table correctly. I made your patches and it worked well.
I'm not new to x86 real mode but I am newer to Ghidra analysis. So this has been a savior. The app I'm analyzing was coded in Turbo C++ 1.0.
@LukeSerne having made these changes to my copy of Ghidra I iniitially thought "great", no issues! I hadn't been looking in the right place! After stumbling across something I'd analysed a while ago, I couldn't see why the whole switch statement had disappeard! I found it to be this change.
What I've subsequently discovered is that the pcode for my Borland JMP is similar to the the example https://github.com/NationalSecurityAgency/ghidra/issues/6694#issuecomment-2211349992 except the assignment to CS use :2 instead of :4 (see below)
1048:0473 2e ff a7 83 02e JMP word ptr CS:[BX + 0xb83]
0b
$U4300:4 = INT_RIGHT 0x10480478:4, 16:4
$U4400:4 = INT_AND $U4300:4, 0xffff:4
CS = COPY $U4400:2
$U3200:2 = INT_ADD BX, 0xb83:2
$U4300:4 = INT_RIGHT 0x10480478:4, 16:4
$U4400:4 = INT_AND $U4300:4, 0xffff:4
CS = COPY $U4400:2
$U4d00:4 = CALLOTHER "segment", CS, $U3200:2
$Ub880:2 = LOAD ram($U4d00:4)
$U3f100:4 = CALLOTHER "segment", CS, $Ub880:2
BRANCHIND $U3f100:4
It seems like your sample is using protected mode instead of real mode. CS is set to (0x10480478:4 >> 16:4) & 0xFFFF:4, which is 0x1048. This seems correct to me. I don't think the copy into a two-byte wide varnode causes a problem, since you only need 2 bytes to store 0x1048. I don't see anything wrong with the p-code. CS seems to have the correct value when it's used.
Does Ghidra find any references at this instruction? If so, to what addresses? What would the correct addresses be? Can you share the sample that contains this function where the patch causes the switch to no longer be recovered correctly?
@LukeSerne my DLL is using protected mode and I was just trying to say that I found the problem ONLY after applying the changes recommened earlier.
I made just the changes above to a local build of Ghidra''s master, commit 1ca9e32a5712bc48a603f9e60d8f692220071eb7 and have included my function that I extracted using theDecompile:Panel's Debug Function Decompilation menu. (I hope that'll work for you.)
loadStates_6694.zip
NB: my decompilation shows the whole function but the actual switch is missing, and its replacement appears at the bottom of the function and looks like:
pDVar8 = (DgnStateManager_0x2e_t *)(*(code *)*(undefined2 *)((nType - 1) * 2 + 0xb83))( );
Thanks for the extra information. However, it seems like the debug xml generated by the decompiler does not contain all the relevant memory regions, because I get this error when trying to load it:
[decomp]> restore /tmp/tmp.rvXfJqQcYf/dragon_loadStates_6694.xml
/tmp/tmp.rvXfJqQcYf/dragon_loadStates_6694.xml successfully loaded: Intel/AMD 16-bit x86 Protected Mode
[decomp]> lo fu LoadStates
Low-level ERROR: Bytes at 0x10480000 are not mapped
Unable to proceed with function: LoadStates
Note that there's also an "invalid type" in the xml, named -MISSING-SegmentCodeAddress with its size set to -1. Not sure what this means, but you might want to fix that as well. I don't think it's related to the switch-statement issue though.
Could you send me the whole program (in .gzf for example)?
Below is the missing bytes that represent the switch table (I forgot that Ghidra fails to include these in its export).
8e048e048e048e048e04
Just add them to the end of the <bytechunk> in the xml (you probably already know that, sorry).
As for the -MISSING-SegmentCodeAddress, that's weird. It should be Ghidra's built-in SegmentCodeAddress.
Failing that, there's an old zipped copy in https://github.com/NationalSecurityAgency/ghidra/issues/6004#issue-2022636212 and this function starts at address 1048:03fa.
The main difference between the example shared earlier and your sample is that the auto analysis in the st.exe sample automatically defines values for CS and DS at the start of the function:
Your problematic function only has an assume set for DS. This is probably done by the auto analyser, possibly because it's possible to unambiguously determine the value of CS based on the instruction pointer in protected mode.
Therefore, it seems the simplest fix to my patch is to restore the currentCS line for protectedMode=1. If I restore that line, the switch is analysed correctly.
Below is the updated patch. I manually edited the patch file I sent earlier, so the index hashes are incorrect now. However, that shouldn't be a problem when applying it.
diff --git a/Ghidra/Processors/x86/data/languages/ia.sinc b/Ghidra/Processors/x86/data/languages/ia.sinc
index dd8c841131..1bf5513a0c 100644
--- a/Ghidra/Processors/x86/data/languages/ia.sinc
+++ b/Ghidra/Processors/x86/data/languages/ia.sinc
@@ -1046,8 +1046,8 @@ addr64: [Base64 + Index64*ss + simm32_64] is mod=2 & r_m=4; Index64 & Base64 & s
addr64: [Base64 + Index64*ss] is mod=2 & r_m=4; Index64 & Base64 & ss; imm32=0 { local tmp=Base64+Index64*ss; export tmp; }
@endif
-currentCS: CS is protectedMode=0 & CS { tmp:4 = (inst_next >> 4) & 0xf000; CS = tmp:2; export CS; }
+currentCS: CS is protectedMode=0 & CS { export CS; }
currentCS: CS is protectedMode=1 & CS { tmp:4 = (inst_next >> 16) & 0xffff; CS = tmp:2; export CS; }
segWide: is segover=0 { export 0:$(SIZE); }
@@ -2526,10 +2526,10 @@ with : lockprefx=0 {
@endif
# direct far calls generate an opcode undefined exception in x86-64
-:CALLF ptr1616 is vexMode=0 & addrsize=0 & opsize=0 & byte=0x9a; ptr1616 { push22(CS); build ptr1616; push22(&:2 inst_next); call ptr1616; }
-:CALLF ptr1616 is vexMode=0 & addrsize=1 & opsize=0 & byte=0x9a; ptr1616 { push42(CS); build ptr1616; push42(&:2 inst_next); call ptr1616; }
-:CALLF ptr1632 is vexMode=0 & addrsize=0 & opsize=1 & byte=0x9a; ptr1632 { push22(CS); build ptr1632; push24(&:4 inst_next); call ptr1632; }
-:CALLF ptr1632 is vexMode=0 & addrsize=1 & opsize=1 & byte=0x9a; ptr1632 { pushseg44(CS); build ptr1632; push44(&:4 inst_next); call ptr1632; }
+:CALLF ptr1616 is vexMode=0 & addrsize=0 & opsize=0 & byte=0x9a; ptr1616 { tmp:2 = CS; push22(CS); build ptr1616; push22(&:2 inst_next); call ptr1616; CS = tmp; }
+:CALLF ptr1616 is vexMode=0 & addrsize=1 & opsize=0 & byte=0x9a; ptr1616 { tmp:2 = CS; push42(CS); build ptr1616; push42(&:2 inst_next); call ptr1616; CS = tmp; }
+:CALLF ptr1632 is vexMode=0 & addrsize=0 & opsize=1 & byte=0x9a; ptr1632 { tmp:2 = CS; push22(CS); build ptr1632; push24(&:4 inst_next); call ptr1632; CS = tmp; }
+:CALLF ptr1632 is vexMode=0 & addrsize=1 & opsize=1 & byte=0x9a; ptr1632 { tmp:2 = CS; pushseg44(CS); build ptr1632; push44(&:4 inst_next); call ptr1632; CS = tmp; }
:CALLF addr16 is vexMode=0 & addrsize=0 & opsize=0 & byte=0xff; addr16 & reg_opcode=3 ... { local ptr:$(SIZE) = segment(DS,addr16); local addrptr:$(SIZE) = segment(*:2 (ptr+2),*:2 ptr);
push22(CS); push22(&:2 inst_next); call [addrptr]; }
:CALLF addr32 is vexMode=0 & addrsize=1 & opsize=0 & byte=0xff; addr32 & reg_opcode=3 ... { local dest:4 = addr32; push42(CS); push42(&:2 inst_next); call [dest]; }
I should probably also make a PR out of this to make it easier for people to use this patch, and fix possible issues if they pop up later.
@LukeSerne leaving the currentCS for protectedMode=1 as is does make my example work, so thanks. I just woud like to understand this part of Ghidra better as I'd like to find a cure for the issues stemming from segmentation and the understanding of near vs far pointers and multi-register use for types that are too large for individual ones (like in 16-biit X86, a long or a far pointer is returned in the DX:AX combo etc)!
@Wall-AF re "understanding of near vs far pointers and multi-register use for types that are too large for individual ons (like in 16-biit X86, a long or a far pointer is returned in the DX:AX combo etc)!"
If you are referring to understanding DX:AX combo usage in Borland 16 bit programs then Agner Fog's Calling conventions for different C++ compilers and operating systems booklet may be of help.
Table 7 lists when 16 bit Borland returns structure, class and union objects using I = Returned in integer registers, or PF = Far pointer to temporary memory space passed on stack and returned in DX:AX.
calling_conventions.pdf downloaded from https://www.agner.org/optimize/#manuals
Suggest downloading all five documents as they are all excellent resources for decompiling.
@ABratovic thanks for that, interesting read. I was meaning I'd like to fix Ghidra so that it understood and could work better! I have added some stuff to my repo that helps with PASCAL calling convention and some parts of sized pointers, but I've no way solved everything in that area!