tool-conventions icon indicating copy to clipboard operation
tool-conventions copied to clipboard

ABI for C functions without prototypes

Open dschuff opened this issue 8 years ago • 5 comments

If you have a K&R-style C function declaration, such as int foo(); currently LLVM will lower calls to it with the usual fixed-arg calling convention. Binaryen's s2wasm will leave the calls untouched, and generate foo's function section entry using the type of foo's implementation. If there is a mismatch, then the linked module fails validation.

We should specify in the ABI what is supposed to happen, and LLVM and lld should implement that. The wasm ABI is somewhat unique in that the vararg calling convention is incompatible with the fixed-arg calling convention. Based on previous discussion I think we are still happy with that decision, so that means that at high level there are 2 options here:

  1. All calls to functions with no prototype use the vararg calling convention.
    • As a result the signatures of the imports of such functions in the callers' object files all just take a single i32 (the vararg buffer).
    • Of course if the implementation of the function is not vararg, then there will be a link-time/validation failure or a runtime failure (if the wasm signature happens to match).
  2. Calls to such functions use the fixed-arg calling convention.
    • The signatures in a caller's object file will have whatever arguments are used at the callsite. (If multiple callsites in the object file have different arguments, that would be a compile-time error).

In either case, the linker would check the signatures of all of the the imports for a particular function against the implementation of that function (i.e. the function signature in the object that exports it), and would issue an error if there was a mismatch. Also in either case, a mismatch at the source level could accidentally match at the wasm level, and result in a more-difficult-to-debug runtime failure.

I had thought that option 1 would be easier to specify in an ABI, but now I'm not so sure. Basically everything that needs to be specified (e.g. how does a C function signature lower to a wasm signature, what signature gets put in the import section of an object file, how the linker resolves mismatches, etc) has to be specified either way, and furthermore most of the behaviors will be the same either way. As far as I can see, the only difference is whether we use the vararg or fixed-arg convention. Given that, I'm actually more inclined just to use the fixed-arg convention on the grounds that it's almost always what people actually want.

Thoughts?

dschuff avatar Aug 28 '17 20:08 dschuff

Agreed w/ 2 -- I think some folks who come from C++ are surprised to find out about the weird old C behavior. I'd be surprised if there is a significant amount of code using functions without prototypes as vararg (but would love to be proved wrong w/ examples!)

binji avatar Aug 28 '17 20:08 binji

@binji good point about the footgun of declaring int foo(); in a header when you mean int foo(void);, that probably makes this way more common.

Also there's probably nothing sane we can actually do here, because how should we lower foo to vararg? Legal C vararg functions require a sentinel value, so for any prototypeless function that we assume is vararg we would need to guess what type that sentinel has. We could do that by inspecting the callsite, but then why not use that information to assume non-vararg?

jgravelle-google avatar Aug 28 '17 21:08 jgravelle-google

The footgun with () vs (void) is already mostly avoided today. Clang produces clever LLVM IR like this:

  %t = call i32 bitcast (i32 (...)* @foo to i32 ()*)()

because clang Just Knows that even though foo needs to have a variadic declaration in LLVM IR, a callsite with no arguments can use the non-variadic convention. This is done in pre-existing target-independent logic. It's an optimization on other targets, but it's effectively part of the ABI on WebAssembly as things currently stand, and it happens to Just Work and be just what we need to avoid the footgun. That's admittedly awkward to have in an ABI, but I think it's within justifiability.

The EM_ASM implementation currently uses an unprototyped function, though perhaps we can find other ways to implement that.

Are there are any other known users caring about unprototyped functions? It's difficult to decide whether to do more here without the benefit of users reporting actual problems.

sunfishcode avatar Aug 28 '17 22:08 sunfishcode

I think -Werror=implicit-function-declaration helps avoid some of these issues. emcc.py currently turns it on by default. Maybe we could do that at the clang level?

kripken avatar Aug 28 '17 22:08 kripken

See also https://bugs.llvm.org/show_bug.cgi?id=35385

NWilson avatar Nov 28 '17 18:11 NWilson