usercall.hpp icon indicating copy to clipboard operation
usercall.hpp copied to clipboard

Example snippet returns incorrect output when compiled without optimizations (x86)

Open illnyang opened this issue 3 years ago • 4 comments

Macro-expanded snippet from README.md (past-return code removed)
#include <stdio.h>
void __stdcall example(__int32 arg3);
unsigned __int32 __cdecl example_trampoline(__int32 arg, __int32 arg2, __int32 arg3);
void __stdcall example2();
void __cdecl example2_trampoline();

int main()
{
    int a = 1;
    int b = 3;
    printf("a = %d, b = %d\n", a, b);
    a = example_trampoline(a, 0, b);
    printf("a = %d, b = %d\n", a, b);
    example2_trampoline();
    return a;
}

void __stdcall example(__int32 arg3)
{
    __asm { push EAX }
    __asm { push EBX }
    __asm { push ECX }
    __asm { push EDI }
    __asm { push EDX }
    __asm { push ESI }

    __int32 arg;
    __asm { mov arg, eax }

    __int32 arg2;
    __asm { mov arg2, ebx }

    printf("arg = %d, arg3 = %d\n", arg, arg3);
    arg = arg + arg3;
    printf("arg = %d, arg3 = %d\n", arg, arg3);

    __asm { pop ESI }
    __asm { pop EDX }
    __asm { pop EDI }
    __asm { pop ECX }
    __asm { pop EBX }
    __asm { pop EAX }

    unsigned __int32 _usercall_internal_return_ = arg;
    __asm { mov eax, _usercall_internal_return_ }
    return;
}

unsigned __int32 __cdecl example_trampoline(__int32 arg, __int32 arg2, __int32 arg3) {
    __asm { mov eax, arg }
    __asm { mov ebx, arg2 }

    example(arg3);

    unsigned __int32 _usercall_internal_return_;
    __asm { mov _usercall_internal_return_, eax }
    return _usercall_internal_return_;
}

void __stdcall example2()
{
    __asm { push EAX }
    __asm { push EBX }
    __asm { push ECX }
    __asm { push EDI }
    __asm { push EDX }
    __asm { push ESI }
    printf("void function\n");
    __asm { pop ESI }
    __asm { pop EDX }
    __asm { pop EDI }
    __asm { pop ECX }
    __asm { pop EBX }
    __asm { pop EAX }
    return;
}

void __cdecl example2_trampoline()
{
    example2();
}
/DWIN32 /D_WINDOWS /EHsc /Zi /Ob0 /Od /RTC1 -MDd

a = 1, b = 3
arg = 3, arg3 = 3
arg = 6, arg3 = 3
a = 6, b = 3
void function

disasm @ godbolt

/DWIN32 /D_WINDOWS /EHsc /O2 /Ob2 /DNDEBUG -MD

a = 1, b = 3
arg = 1, arg3 = 3
arg = 4, arg3 = 3
a = 4, b = 3
void function

disasm @ godbolt

Microsoft (R) C/C++ Optimizing Compiler Version 19.30.30709 for x86

illnyang avatar Mar 06 '22 21:03 illnyang

example_trampoline + 0x0C	mov	eax, [ebp + arg]	; all good
example_trampoline + 0x0F	mov	ebx, [ebp + arg2]	; ditto
example_trampoline + 0x12	mov	eax, [ebp + arg3]	; <--- arg becomes arg3, which results in 3+3=6 later
example_trampoline + 0x15	push	eax
example_trampoline + 0x16	call	example

probably wont-fix.... i dont think theres much you can do to alter MSVC codegen code-wise. playing around with compilation flags may yield a workaround, which could then be mentioned in README.md and/or implemented in CMakeLists.txt.

two questions:

  1. imagine a better/alternate reality in which MSVC supports Extended Asm constructs. would it be possible to incorporate those constructs into usercall.hpp ?

  2. clang already supports many MS-extensions, including __pragma, as well as Extended Asm constructs. what other extensions/undocumented features are necessary in order to achieve compatibility with Clang? it would be nice to have MS-specific oddities listed somewhere in this repo. as it stands, Clang seems to hate #define HASH() # trick.

illnyang avatar Mar 06 '22 22:03 illnyang

Unfortunately you are correct that there isn’t much that can be done here at the library level. usercall.hpp depends heavily on undocumented and undefined behavior in MSVC and the new preprocessor (the readme makes this pretty clear imo) which makes the library, and inline assembly in general, unpredictable at times. If you have a suggestion of what to add to the readme I would be happy to add it. My thinking was that adding any advice about compiler flags was futile since new versions of MSVC would change the code gen and my advice would be made obsolete.

  1. This would certainly allow improvements in usercall.hpp. Unfortunately, Microsoft doesn’t seem interested in improving inline assembly so I wouldn’t hold my breath.

  2. Since usercall.hpp is only possible due to the strange macro expansion rules in the new MSVC preprocessor I doubt it could support Clang as a header only macro library. The def hash hash trick is only scratching the surface.

That being said, I am currently working on a C++ language extension to implement the IDA Pro usercall syntax directly into a fork of Clang/LLVM. Once this is complete I am likely going to archive usercall.hpp and add a link to the Clang/LLVM at the top of the readme advising people to use that instead. This new solution has several advantages over usercall.hpp: it uses the actual IDA Pro syntax rather than an imitation of it, it can support all of the architectures that Clang supports rather than only x86_32 (since MSVC only supports x86_32 inline assembly), it is highly integrated with Clang’s diagnostics infrastructure for better error messages, it is far more portable since Clang can run on and compile for pretty much every architecture, and many more things. It’s a much heavier solution since you would need to build the toolchain rather than just include a header but it is far more robust. It is for this reason that I am not interested in working on usercall.hpp anymore and would like to focus on the new project.

Thank you for using my library and writing such a comprehensive issue!

widberg avatar Mar 10 '22 06:03 widberg

i was looking into Clang internals as well, but my approach was less elegant. i tried to mitigate C99 6.10.3.4 conformance from the preprocessor lexer and related components:

clang/Lex/MacroArgs clang/lib/Lex/PPMacroExpansion.cpp clang/lib/Lex/PPDirectives.cpp clang/lib/Lex/Preprocessor.cpp

without any success..

Needless to say, your approach is the proper way to go. I will leave a link to a project that is similar to IDA in regards to what kind of low-level syntax sugars it provides:

DWORD InitD3D()
{
	Direct3DCreate9(D3D_SDK_VERSION);
	if (EAX==0)
		return;
	
	D3D=EAX;

	EAX=#d3dpp;
	RtlZeroMemory(EAX, sizeof(D3DPRESENT_PARAMETERS));
	d3dpp.Windowed=TRUE;
	d3dpp.SwapEffect=D3DSWAPEFFECT_DISCARD;
	d3dpp.BackBufferFormat=D3DFMT_UNKNOWN;
	EAX=#d3dpp;

............

your Clang plugin would be a game changer for people fixing/modding old games. code-caving in particular can't be done without naked stubs. It would also render "fastcall/thiscall" trick obsolete. consider opening a repository "mid-work", perhaps I and other people will be able to help with the plugin in some way

looking forwards to your future projects!

illnyang avatar Mar 13 '22 03:03 illnyang

Thanks for the C-- link! Heads up, Clang already allows __thiscall on non-member functions, no modifications needed, so the __fastcall trick isn't necessary if you are using Clang. I tweeted about this with a godbolt example recently. https://twitter.com/w1db3rg/status/1498481973366505475

The Clang fork can be found at https://github.com/widberg/llvm-project-widberg-extensions

widberg avatar Mar 13 '22 05:03 widberg