ghidra
ghidra copied to clipboard
__thiscall function pointer confuses the decompiler
Describe the bug
__thiscall
function pointers in a virtual function table causes the decompiler to map the parameters incorrectly.
Example:
Assume a __thiscall
function with 2 int
arguments.
This means it expects this
in ECX
and 2 variables in Stack[0x4]
and Stack[0x8]
.
When calling this function, the decompiler maps the input ECX
to Stack[0x4]
and the input Stack[0x4]
to Stack[0x8]
.
Because this corrupts the stack of the decompiler, output code after that point is messed up.
To Reproduce Steps to reproduce the behavior:
- Download and analyze
LEGO1.DLL
- Create the following
__thiscall
(!!!) function pointer type:int MxDSSource_Read(byte *, uint)
- Create the following struct:
struct MxDSFile_vtable { pointer functions1[8]; MxDSSource_Read *Read; pointer function2[5]; };
- Modify the
MxDSFile
struct to:struct MxDSFile { MxDSFile_vtable *vfptr; byte data[0x78]; }; ```
- Go to
0x100c0280
(CreateStreamObject
) - See how the
Read
function pointer is called with 2 arguments instead of 3.
Expected behavior
The read function should get 3 arguments: this
, arg1
and arg2
.
Screenshots
The pMVar2
function pointer should accept 3 arguments, but only does 2.
The function call should be instead:
iVar4 = (*pMVar3)(param_1, &stack0xffffffe4, 8);
Attachments
LEGO1.DLL
Environment (please complete the following information):
- OS: Fedora 38
- Java Version: OpenJDK 20.0.1
- Ghidra Version: 10.3.1
- Ghidra Origin: official GitHub distro
Additional context
By marking the function __stdcall
instead of __thiscall
, the decompiler generates clean code.
But then, information is lost about this
/ECX
which is not okay.
I created a small reproducer:
class IReader {
public:
virtual int Read(char *buffer, unsigned int len) = 0;
};
__declspec(dllexport) int ReadSomethingInto(char *buffer, unsigned int len, IReader *reader) {
return reader->Read(buffer, len);
}
Compile this file with:
i686-w64-mingw32-g++ -shared -o a.dll ireader.c
Create the following types in Ghidra:
typedef int (__thiscall * read_fn)(char * , unsigned int );
struct IReader_vtable {
read_fn *Read;
};
struct IReader {
IReader_vtable *vfptr;
};
This results in the following incorrect code:
The arguments of Read
are incorrect.
Making read_fn
__stdcall creates the following code:
Currently, when creating Function Definition data-types, all parameters need to be provided explicitly in the dialog, including the this
pointer for the __thiscall
calling convention. So if in step 2 for the example, you provide
int MxDSSource_Read(MxDSFile *this,byte *, uint)
you should get the expected result.
Admittedly, the requirement to explicitly provide a this
pointer is not obvious, as the normal Edit Function dialog provides it automatically.
FWIW, I came here to find the existing report of this inconsistent behaviour, and discovered it.
I'd ... I suppose, I'd suggest that you remove or hide the calling convention from the "function pointer" type, save that it is really a "function definition" type — and is used to apply details including the calling convention to existing function definitions in code.
Perhaps the least worst choice would be to introduce a new type entirely, solely for declaring function pointers, which is (a) implicitly a pointer type, and (b) does not have features like "calling convention" that are not used when the decompiler (etc) use it to determine actions in code.
I say this because I'm curious to understand if there is some additional constraint that makes that difficult or undesirable for this use case; certainly, it'd be a hugely breaking change to start using the calling convention on the current function type when they are function pointers...