ghidra x86 66h operand size override interfering with decompilers output

    > Fixed by [a04f7fb](https://github.com/NationalSecurityAgency/ghidra/commit/a04f7fbb03e4e38c3f57228885a8387f8198e85c)

Just looked at the new decompilation provided by this fix, and although IT FIXES the first example in this ticket, it doesn't help with the second. In this decompilation, the last 3 lines

   iVar1 = (int)g_pMemStk_9684.m_pvData + g_pMemStk_9684.m_nAvailable;
   g_pMemStk_9684.m_nAvailable = g_pMemStk_9684.m_nAvailable + nReqSize;
   return (LPVOID)CONCAT22(g_pMemStk_9684.m_pvData._2_2_,iVar1);

become

   iVar2 = (int)g_pMemStk_9684.m_pvData + g_pMemStk_9684.m_nUsed;
   uVar1 = (ulong)g_pMemStk_9684._4_4_ & 0xffff;
   g_pMemStk_9684._4_4_ = (LPSTR)(uVar1 | (ulong)(g_pMemStk_9684.m_nUsed + nReqSize) << 0x10);
   return (BYTE *)CONCAT22((int)uVar1,iVar2);

which is incorrect!

Here is the zipped xml for the function MemStk::Alloc(...).

Originally posted by @Wall-AF in https://github.com/NationalSecurityAgency/ghidra/issues/4533#issuecomment-1356246939

Dec 21 '22 14:12 Wall-AF

I believe this to be a combination of issues: i) relating to Ghidra's current inability to correctly fully understand far pointers (ie being able to combine two addresses/registers/(regiser, address combo) and then treat that as a pointer); and ii) the understanding to ignore the high word after loading a 32-bit register followed by using only the 16-bit part (maybe!!).

Dec 21 '22 15:12 Wall-AF

I believe 19a63531c3e6d0062fe788b0e4151d33a4a88c18 fixes the problem you are seeing.

Mar 08 '23 01:03 caheckman

@caheckman Just double-checked and the result now is:

   iVar1 = (int)g_pMemStk_9684.m_pvData + g_pMemStk_9684.m_nUsed;
   g_pMemStk_9684.m_nUsed = g_pMemStk_9684.m_nUsed + nReqSize;
   return (BYTE *)CONCAT22(g_pMemStk_9684.m_pvData._2_2_,iVar1);

better, but the sized pointer is still not fixed.

How do we gt Ghidra to produce output like:

   g_pMemStk_9684.m_nUsed = g_pMemStk_9684.m_nUsed + nReqSize;
   return &g_pMemStk_9684.m_pvData[g_pMemStk_9684.m_nUsed];

Mar 09 '23 15:03 Wall-AF

I'm getting strange results with this too: Assembly:

                             DgnPicTable_t*32 __cdecl16far FUN_10f0_169a(void)
                               assume DS = 0x1210
             DgnPicTable_t     DX:2,AX:2      <RETURN>
                             FUN_10f0_169a                                   XREF[1]:     FUN_1160_0341:1160:0342(c)  
   10f0:169a 66 8b 16       0           MOV        EDX,dword ptr DS:[g_pPicTable_96ca+2]
             cc 96
   10f0:169f a1 ca 96       0           MOV        AX,DS:[g_pPicTable_96ca]                         = 0000:0000
   10f0:16a2 cb             0           RETF

Decompiles to:

/* WARNING: Globals starting with '_' overlap smaller symbols at the same address */

DgnPicTable_t * __cdecl16far FUN_10f0_169a(void)

{
   return (DgnPicTable_t *)CONCAT22((int)ram0x121096cc,g_pPicTable_96ca._0_2_);
}

The code bytes are: 66 8b 16 cc 96 a1 ca 96 cb

If I replace the first byte (66) with NOP (90) I get this:

DgnPicTable_t * __cdecl16far FUN_10f0_169a(void)

{
   return (DgnPicTable_t *)CONCAT22(g_pPicTable_96ca._2_2_,g_pPicTable_96ca._0_2_);
}

better, but still issues with far pointer addresses (a different problem!).

May 27 '23 17:05 Wall-AF

ghidra ghidra copied to clipboard

x86 66h operand size override interfering with decompilers output

ghidra
ghidra copied to clipboard