ghidra Support MSVC 32-bit "C decorated name" style symbol mangle (stdcall, fastcall, cdecl)

Is your feature request related to a problem? Please describe. The MSVC compiler will mangle C symbols, with a different style being used per calling convention. According to the documentation, this is only applicable to 32-bit code.

Ghidra does not currently handle demangling of these symbols. I do have a workaround in a script here, but it should be fixed properly, likely in MDmang.

The mangle/decoration style is documented here, and I'll reproduce it here:

Calling convention	Decoration
__cdecl	Leading underscore (_)
__stdcall	Leading underscore (_) and a trailing at sign (@) followed by the number of bytes in the parameter list in decimal
__fastcall	Leading and trailing at signs (@) followed by a decimal number representing the number of bytes in the parameter list
__vectorcall	Two trailing at signs (@@) followed by a decimal number of bytes in the parameter list

Here is a minimal example for testing, compiled with MSVC compiler (32bit 13.00.9466). When loading into Ghidra, the symbols will be recovered but will not be demangled properly.

int __stdcall func_stdcall(int x) {
	return x+1;
}

int __fastcall func_fastcall(int x) {
	return x+1;
}

int __cdecl func_cdecl(int x) {
	return x+1;
}

Compile with: cl.exe /c example.c Symbol table:

$ strings example.obj | grep func
_func_stdcall@4
@func_fastcall@4
_func_cdecl

Download source and compiled object file here: example.zip

If you link to create a full .exe with the /map flag to generate a mapfile, the map file will also list symbols in this way:

 0001:00000000       _func_stdcall@4            00401000 f   example.obj
 0001:0000000d       @func_fastcall@4           0040100d f   example.obj
 0001:0000001e       _func_cdecl                0040101e f   example.obj

Describe the solution you'd like Ideally Ghidra should support demangling these symbol names with the MicrosoftDemangler.

Describe alternatives you've considered I've created a script to demangle these names, but I consider the script approach to be a second-class solution.

Additional context Ghidra recovers the symbol names from the object file symbol table, but does not demangle them as expected:

Feb 05 '20 01:02 mborgerson

I extended your example with

int __stdcall func_stdcall_1(int x, char c) {
	return x+1+(int)c;
}

int __fastcall func_fastcall_1(int x, char c) {
	return x+1+(int)c;
}

int __cdecl func_cdecl_1(int x, char c) {
	return x+1+(int)c;
}

int __stdcall func_stdcall_2(int x, char c, float f) {
	return x+1+(int)c+((f > 0)? 1: 0);
}

int __fastcall func_fastcall_2(int x, char c, float f) {
	return x+1+(int)c+((f > 0)? 1: 0);
}

int __cdecl func_cdecl_2(int x, char c, float f) {
	return x+1+(int)c+((f > 0)? 1: 0);
}

Output is was as I thought it might be (representing 4 bytes of storage for char):

 0001:00000000       _func_stdcall@4            00401000 f   test.obj
 0001:00000010       @func_fastcall@4           00401010 f   test.obj
 0001:00000030       _func_cdecl                00401030 f   test.obj
 0001:00000040       _func_stdcall_1@8          00401040 f   test.obj
 0001:00000060       @func_fastcall_1@8         00401060 f   test.obj
 0001:00000080       _func_cdecl_1              00401080 f   test.obj
 0001:00000090       _func_stdcall_2@12         00401090 f   test.obj
 0001:000000d0       @func_fastcall_2@12        004010d0 f   test.obj
 0001:00000110       _func_cdecl_2              00401110 f   test.obj

I don't think there is anything we can do with the parameter storage anyway.

Regarding cdecl mangling, I don't see any way to distinguish the mangled symbol from any other non-mangled symbol. It could easily be a global integer variable.

Feb 05 '20 20:02 ghizard

@ghizard - Thank you for taking the time to look into this!

I don't think there is anything we can do with the parameter storage anyway.

I'm not sure exactly, but I'm pretty sure the prototype could be checked/updated to make sure the correct number of paramaters are present. The char param in your example will be implicitly sign extended to an integer, so in your example above I do expect it to take 4 bytes as you've shown.

Regarding cdecl mangling, I don't see any way to distinguish the mangled symbol from any other non-mangled symbol. It could easily be a global integer variable.

Although it is a little vague, according to the documentation I linked previously, I think all C symbols compiled with MSVC should be decorated/mangled in this case (32b executable), unless some symbol is intentionally added that violates this convention.

As for globals, I actually wasn't sure what the compiler would do; so I've tested it and confirmed that global integer values are also decorated like cdecl, with a prefixed underscore. Compiled the same way as before:

int my_global;
int _my_global2;

$ strings example.obj | grep my_global
__my_global2
_my_global

I think this name demangling policy would be generally safe to add (that is, not demangling clean names), under the condition that the target executable being analyzed is 32bit.

Feb 05 '20 21:02 mborgerson

Status?

Aug 27 '24 13:08 GXTX

@mborgerson @GXTX

There has been no progress on this other than more thoughts triggered by your query.

Issue in the past was with a collision in the mangled name space. Example is if we find a symbol _name at an address, if it is of an integer type, we don't want to create a function with cdecl convention; but if there is or should be a function, we might want to create a function with its signature. There could also be a collision between the mangled name in the older scheme and a non-mangled name in the newer scheme where both are functions, but have different calling conventions. The current demangler processing model does not use knowledge about the target architectures or whether there is or will be a function at the address; this is likely part of the change that would be needed to make this work.

Assuming we change the processing model to pass in information to break the processing conflict, we then need to confirm that we can lock in the signature in an appropriate way that does not lock it so tightly that the Decompiler cannot hone the result. I believe that we have the new ability to set some parts of the signature (maybe the CC) without locking all... need to wait for @ghidra1 to consult. I also think we would be limited to setting arguments to undefined4 type, which would hopefully let them be reassigned by the Decompiler. What the triage of 4 years ago did was confirm that char, int, and float all used 4 bytes; so it would be an assumption that we would take the total size of parameters and divide by 4 to get the number of parameters. If this assumption is wrong, it could make things worse.

Do you have thoughts on this?

Sep 04 '24 16:09 ghizard

ghidra ghidra copied to clipboard

Support MSVC 32-bit "C decorated name" style symbol mangle (stdcall, fastcall, cdecl)

ghidra
ghidra copied to clipboard