ghidra
ghidra copied to clipboard
Decompiler failure: Incorrect calling convention, likely trash, it's complicated
Describe the bug The decompiler overzealously applies dead code.
To Reproduce
- Open attached executable. Don't load the PDB.
- Decompile 0x411770
- Obtain
/* WARNING: Function: __RTC_CheckEsp replaced with injection: __RTC_CheckEsp */
void * FUN_00411770(uint param_1)
{
int iVar1;
undefined4 *puVar2;
undefined4 local_d0 [49];
void *local_c;
puVar2 = local_d0;
for (iVar1 = 0x33; iVar1 != 0; iVar1 += -1) {
*puVar2 = 0xcccccccc;
puVar2 = puVar2 + 1;
}
thunk_FUN_00411650();
if ((param_1 & 1) != 0) {
operator_delete(local_c);
}
return local_c;
}
- Observe that
local_c
is not defined. - Observe in the below disassembly image that
local_c
at 0x4117a3 should probably be live.
Expected behavior
I expected the decompilation to contain return param_1;
, or local_c = param_1;
.
Screenshots
Attachments (don't use the pdb, it's just there for reference) ooex0.zip
Environment (please complete the following information):
- OS: Ubuntu 22.04.3 LTS
- Java Version: Zulu 17.0.6
- Ghidra Version: 10.4
- Ghidra Origin: Official GitHub Distro
Additional context Some debugger debug messages:
~/g/g/G/F/D/s/d/cpp $ SLEIGHHOME=/home/ed/ghidra/ghidra_10.4_PUBLIC/ ./decomp_dbg 10:14:25
[decomp]> restore /home/ed/Dropbox/temp/2023-11-27 - 411770.xml
Decoding ERROR: Unable to open xml document /home/ed/Dropbox/temp/2023-11-27
[decomp]> restore "/home/ed/Dropbox/temp/2023-11-27 - 411770.xml"
Decoding ERROR: Unable to open xml document "/home/ed/Dropbox/temp/2023-11-27
[decomp]> restore /tmp/411770.xml
/tmp/411770.xml successfully loaded: Intel/AMD 32-bit x86
[decomp]> load function 0x411770
Execution error: Unknown function name: 0x411770
[decomp]> load function FUN_00411770
Function FUN_00411770: 0x00411770
[decomp]> trace address 0x411793
OK (1 ranges)
[decomp]> decompile
Decompiling FUN_00411770
DEBUG 0: heritage
0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(free) + #0xfffffff8
0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(0x00411771:3) + #0xfffffff8
0x00411793:2e: u0x00007a80(0x00411793:2e) = *(ram,u0x00001d00(free))
0x00411793:2e: u0x00007a80(0x00411793:2e) = *(ram,u0x00001d00(0x00411793:2d))
0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(free)
0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(0x00411793:2e)
DEBUG 1: deadcode
0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(0x00411771:3) + #0xfffffff8
0x00411793:2d: **
0x00411793:2e: u0x00007a80(0x00411793:2e) = *(ram,u0x00001d00(0x00411793:2d))
0x00411793:2e: **
0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(0x00411793:2e)
0x00411793:2f: **
I am in OPACTION_DEBUG.
rule_debug.
after entries.
Scope FUN_00411770
param_1 : s0x00000004:4 uint : all
flags = 8.
after s.
Decompilation complete
[decomp]> trace address 0x4117a3
OK (2 ranges)
[decomp]> decompile
Clearing old decompilation
Decompiling FUN_00411770
DEBUG 0: start
0x00411793:2d: **
0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(free) + #0xfffffff8
0x00411793:2e: **
0x00411793:2e: u0x00007a80(0x00411793:2e) = *(ram,u0x00001d00(free))
0x00411793:2f: **
0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(free)
0x004117a3:40: **
0x004117a3:40: u0x00001d00(0x004117a3:40) = EBP(free) + #0xfffffff8
0x004117a3:41: **
0x004117a3:41: u0x00007a80(0x004117a3:41) = *(ram,u0x00001d00(free))
0x004117a3:42: **
0x004117a3:42: EAX(0x004117a3:42) = u0x00007a80(free)
DEBUG 1: heritage
0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(free) + #0xfffffff8
0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(0x00411771:3) + #0xfffffff8
0x00411793:2e: u0x00007a80(0x00411793:2e) = *(ram,u0x00001d00(free))
0x00411793:2e: u0x00007a80(0x00411793:2e) = *(ram,u0x00001d00(0x00411793:2d))
0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(free)
0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(0x00411793:2e)
0x004117a3:40: u0x00001d00(0x004117a3:40) = EBP(free) + #0xfffffff8
0x004117a3:40: u0x00001d00(0x004117a3:40) = EBP(0x00411771:3) + #0xfffffff8
0x004117a3:41: u0x00007a80(0x004117a3:41) = *(ram,u0x00001d00(free))
0x004117a3:41: u0x00007a80(0x004117a3:41) = *(ram,u0x00001d00(0x004117a3:40))
0x004117a3:42: EAX(0x004117a3:42) = u0x00007a80(free)
0x004117a3:42: EAX(0x004117a3:42) = u0x00007a80(0x004117a3:41)
DEBUG 2: deadcode
0x00411793:2d: u0x00001d00(0x00411793:2d) = EBP(0x00411771:3) + #0xfffffff8
0x00411793:2d: **
0x00411793:2e: u0x00007a80(0x00411793:2e) = *(ram,u0x00001d00(0x00411793:2d))
0x00411793:2e: **
0x00411793:2f: ECX(0x00411793:2f) = u0x00007a80(0x00411793:2e)
0x00411793:2f: **
DEBUG 3: propagatecopy
0x004117a3:40: u0x00001d00(0x004117a3:40) = EBP(0x00411771:3) + #0xfffffff8
0x004117a3:40: u0x00001d00(0x004117a3:40) = ESP(0x00411770:1) + #0xfffffff8
DEBUG 4: addmultcollapse
0x004117a3:40: u0x00001d00(0x004117a3:40) = ESP(0x00411770:1) + #0xfffffff8
0x004117a3:40: u0x00001d00(0x004117a3:40) = ESP(i) + #0xfffffff4
DEBUG 5: earlyremoval
0x004117a3:42: EAX(0x004117a3:42) = u0x00007a80(0x004117a3:41)
0x004117a3:42: **
DEBUG 6: loadvarnode
0x004117a3:41: u0x00007a80(0x004117a3:41) = *(ram,u0x00001d00(0x004117a3:40))
0x004117a3:41: u0x00007a80(0x004117a3:41) = s0xfffffff4(free)
DEBUG 7: heritage
0x004117a3:41: u0x00007a80(0x004117a3:41) = s0xfffffff4(free)
0x004117a3:41: u0x00007a80(0x004117a3:41) = s0xfffffff4(0x00411796:d2)
DEBUG 8: deadcode
0x004117a3:40: u0x00001d00(0x004117a3:40) = ESP(i) + #0xfffffff4
0x004117a3:40: **
DEBUG 9: earlyremoval
0x004117a3:41: u0x00007a80(0x004117a3:41) = s0xfffffff4(0x00411796:d2)
0x004117a3:41: **
I am in OPACTION_DEBUG.
rule_debug.
after entries.
Scope FUN_00411770
param_1 : s0x00000004:4 uint : all
flags = 8.
after s.
Decompilation complete
I think perhaps this is the root cause:
DEBUG 28: likelytrash
0x00411796:d2: s0xfffffff4(0x00411796:d2) = ECX(i) [] i0x00411796:32:8(free)
0x00411796:d2: s0xfffffff4(0x00411796:d2) = [create] i0x00411796:32:8(free)
So likelytrash
seems to ignore inputs not compatible with the calling convention. So I suppose the root problem is that we don't detect the right calling convention (thiscall).
[decomp]> restore /home/ed/Dropbox/temp/2023-11-27 - 411770.xml
Decoding ERROR: Unable to open xml document /home/ed/Dropbox/temp/2023-11-27
[decomp]> restore "/home/ed/Dropbox/temp/2023-11-27 - 411770.xml"
Decoding ERROR: Unable to open xml document "/home/ed/Dropbox/temp/2023-11-27
[decomp]> restore /tmp/411770.xml
Human: Please open file. Computer: no Human: Please open "file" Computer: no lots of swearing, cp file /tmp/file Human: Please open /tmp/file Computer: sure :smile:
https://sourcegraph.com/github.com/NationalSecurityAgency/ghidra@c225fac124821d789b12e4c76f4a0f6eabe83544/-/blob/Ghidra/Features/Decompiler/src/decompile/cpp/coreaction.hh?L818:7
/// Register locations called \b likely \b trash are read as a side-effect of some instruction /// the compiler was using. The canonical example in x86 code is the /// PUSH ECX /// which compilers use to create space on the stack without caring about what's in ECX. /// Even though the decompiler can see that the read ECX value is never getting used directly /// by the function, because the value is getting copied to the stack, the decompiler frequently /// can't tell if the value has been aliased across sub-function calls. By marking the ECX register /// as \b likely \ trash the decompiler will assume that, unless there is a direct read of the /// incoming ECX, none of subfunctions alias the stack location where ECX was stored. This /// allows the spurious references to the register to be removed.
So I think what happens if this:
- We analyze 0x411770. By default, we assume that it is fastcall/thiscall/stdcall.
- We call thunk_FUN_00411650. Since we don't have a prototype for this function yet, we assume it is stdcall.
- Before this call, we push ECX to the stack. Our function does not directly access ECX. Therefore, it is assumed that none of the subfunctions alias the stack location where ECX was stored. Thus ECX and the memory write are considered dead.
So, it's kind of bad luck that ECX is likely trash for stdcall, but is also used for thiscall. This means that we'll have trouble detecting thiscall functions when the only real evidence of the thiscall function is in a callee. The end?
But something about this also doesn't sit right with me. We know that something is wrong (well, probably) because we see local_c read from without being defined. So maybe "unless there is a direct read of the incoming ECX" should be changed to "unless there is a direct read of the incoming ECX OR the stack location where it was stored"? This could have some false positives wrt. popping ecx in the function epilogue... but I suspect such pops would be optimized away anyway since they aren't (usually) returned from the function?
Human: Please open file. Computer: no Human: Please open "file" Computer: no lots of swearing, cp file /tmp/file Human: Please open /tmp/file Computer: sure 😄
New issue: supporting filenames with spaces in decomp_dbg :rofl: