rust-bindgen Generate C code to export static inline functions

https://github.com/intelxed/xed has a bunch of inline functions as part of its interface. In order to make the xed bindings usable (https://github.com/valarauca/xed-sys/issues/1) we need to export them somehow. If bindgen could generate C code that just wraps the static inline functions we would be able to use bindgen to generate bindings to that code which would solve the problem.

Oct 21 '17 14:10 jrmuizel

For now, you can also compile it with -fno-inline-functions, I think.

Oct 21 '17 14:10 emilio

Won't that prevent inlining inside xed?

Oct 21 '17 14:10 jrmuizel

Further, -fno-inline-functions and .generate_inline_functions(true) doesn't actually seem to generate bindings for those functions. Perhaps because they are static inline?

Oct 21 '17 15:10 jrmuizel

Won't that prevent inlining inside xed?

Yup, I think so, which is unfortunate, I agree.

Further, -fno-inline-functions and .generate_inline_functions(true) doesn't actually seem to generate bindings for those functions. Perhaps because they are static inline?

Yeah, if they aren't visible in other translation units we can't generate symbols for them... I guess wrapping them is the best solution then

Oct 21 '17 17:10 emilio

What's the best approach to implement this?

Oct 22 '17 19:10 jrmuizel

It looks like currently static inline functions end up in the ir with a name of None.

Oct 22 '17 21:10 jrmuizel

-fkeep-inline-functions should allow the functions to remain inlined at call sites, but still give us something to link to, IIUC.

This feature is something I've wanted for a while, but haven't had the need to actually implement. The biggest open questions I have are:

How are we going to integrate this feature with our test suite? Emit tests/expectations/tests/whatever.c next to tests/expectations/tests/whatever.rs?
What will the user ergonomics be like? Can we compile the emitted C for the user automatically from build.rs? As much as possible, I'd like to avoid users having to manually add new dependencies on things like gcc or cmake crates and add even more stuff to their build.rs.

What's the best approach to implement this?

A new pass over the final IR, probably called from inside ir::context::BindgenContext::gen right before or after our existing codegen. This should probably live in a new module next to codegen and ir named cgen or inlgen something like that but better :-p

It would be awesome if there was a c_quote! { ... } macro... but that's a pretty tall order if one doesn't exist already.

It looks like currently static inline functions end up in the ir with a name of None.

Unfortunately, I don't have any insight into why this might be off the top of my head.

Oct 23 '17 16:10 fitzgen

It looks like currently static inline functions end up in the ir with a name of None.

Unfortunately, I don't have any insight into why this might be off the top of my head.

This is fixed with https://github.com/rust-lang-nursery/rust-bindgen/pull/1091

I'm probably not going to look at this again anytime soon as I've switched from using https://github.com/valarauca/xed-sys/ to https://github.com/zyantific/zydis-rs which doesn't have this problem.

Oct 24 '17 15:10 jrmuizel

static inline functions are still not generated with the option enabled when I tested on mbedTLS.

Feb 15 '18 03:02 kmod-midori

We have this trouble on YJIT quite extensively - commenting and subscribing here :-)

May 18 '22 10:05 noahgibbs

With the recent advancements in C2Rust, the #1344 route for this is becoming more viable again. (Or maybe even going all C2Rust -- in a sense, bindgen is "just" what C2Rust should be when let lose on a .h files, albeit with many more bells and whistles).

May 18 '22 10:05 chrysn

Just using C2Rust might not work well for us - one thing we need is more support for complex preprocessor defines than bindgen gives (e.g. operators on constants in a define.) In general, we use a lot of C preprocessor constants. Could work well for somebody else, though! Some method of integrating the two tools could be interesting. But it also sounds like it doesn't really exist yet. So we'd need to do a significant amount of tools work on our large existing codebase to make it happen. Not impossible, but not on our short-term roadmap, I don't think.

May 18 '22 11:05 noahgibbs

I was thinking about tackling this issue. However, I'd like to understand a bit better what needs to be done first. Let's say we have a header file header.h like this:

static inline int foo();

In my head what would happen is that we would generate an intermediate c file:

#include <header.h>

int foo_wo_inline() {
    return foo();
}

and then this file would be processed by bindgen and produce something like this:

extern "C" {
    fn foo_no_inline() -> ::std::os::raw::c_int;
}

pub unsafe extern "C" fn foo() -> ::std::os::raw::c_int {
    foo_no_inline()
}

Despite of the name juggling, is this a good way to approach this issue?

Aug 11 '22 21:08 pvdrz

@pvdrz That sounds basically like what we are currently doing manually in our project. Some comments:

It might be better to generate a C file and a header file. Bindgen just needs to process the new header file.
The generated C file should be put into a configurable location, since it needs to be integrated into the C build system - otherwise the new symbol will not be present.

On the Rust side I think using link_name like this might be an improvement:

extern "C" {
    #[link_name = "foo_no_inline"]
    fn foo() -> ::std::os::raw::c_int;
}

Aug 15 '22 03:08 jschwe

I started taking a deeper look at this today and I have a couple questions (@emilio, @fitzgen):

I was thinking on implementing this c_quote/cgen functionality at least to be able to generate the headers and extra C code we need. Would it be OK for you if this was done via a trait implementation for the elements of the IR we need? It would be something like:
```
trait Emit {
    fn emit<W: io::Write>(&self, ctx: &BindgenContext, buf: W);
}
```
If we generate this extra C code in BindgenContext::gen how do we get bindgen to process these new files and merge the output with the already existing output from the original header file? Should we create a new BindgenContext or even a new Builder, process the extra code and then merge the resulting items?

Update: I have a very incomplete but working code generator for C and is good enough to emit the wrapper functions we need. The only thing I need to move forward is knowing how to handle these "two stages" where we first find all the static inline functions and then run bindgen again over the generated files.

Aug 16 '22 20:08 pvdrz

I've done some progress this week.

Right now when a header file has inline functions like this one:

typedef int num;

static inline num foo(num arg) { return arg + 42; }
inline num bar(num arg) { return arg + 0x45; }
num baz(num arg);

the following is printed to stdout:

[bindgen-tutorial 0.1.0] cargo:rerun-if-changed=new_wrapper.h
[bindgen-tutorial 0.1.0] FOUND INLINED FUNCTION: foo
[bindgen-tutorial 0.1.0] FOUND INLINED FUNCTION: bar
[bindgen-tutorial 0.1.0] HEADERS
[bindgen-tutorial 0.1.0] headers: int foo__extern(int arg);
[bindgen-tutorial 0.1.0] headers: int bar__extern(int arg);
[bindgen-tutorial 0.1.0] WRAPPERS
[bindgen-tutorial 0.1.0] wrappers: int foo__extern(int arg) { return foo(arg); }
[bindgen-tutorial 0.1.0] wrappers: int bar__extern(int arg) { return bar(arg); }

Aug 26 '22 20:08 pvdrz

@emilio I have a question before going any further on this issue.

It seems the most robust way of generating the necessary rust code for inlined functions is using bindgen to process the generated c code that wraps the inlined functions.

Regardless of how this c code was generated. We would need to compile the c code first and then "re-run" bindgen over it respecting all the BindgenOptions that were set by the user.

The only way to do this that I can think of is making BindgenOptions: Clone, clone the options and reuse them to "re-run" bindgen. But that's not possible right now because BindgenOptions hold non-clonable state that isn't actually provided by the user a options.

So the question is: Would it make sense to split BindgenOptions into the actual user input and the internal state, clone this user input and re-run bindgen to process the new header files?

Aug 30 '22 20:08 pvdrz

Regardless of how this c code was generated. We would need to compile the c code first and then "re-run" bindgen over it respecting all the BindgenOptions that were set by the user.

I'd be careful about compiling code ourselves (because the CFLAGs we see might not be fully accurate for compilation, eg. because C code is compiled by GCC, and the options were stripped down to support LLVM). At least in my use case (riot-sys), it would be prefereable to have the C file exported somewhere where the remaining build system could pick it up and build it together with the rest of the C code.

Aug 31 '22 08:08 chrysn

Regardless of how this c code was generated. We would need to compile the c code first and then "re-run" bindgen over it respecting all the BindgenOptions that were set by the user. I'd be careful about compiling code ourselves (because the CFLAGs we see might not be fully accurate for compilation, eg. because C code is compiled by GCC, and the options were stripped down to support LLVM). At least in my use case (riot-sys), it would be prefereable to have the C file exported somewhere where the remaining build system could pick it up and build it together with the rest of the C code.

I think that one way to solve this would be to add some sort of callback so the user can specify "how" to compile this generated c code and then bindgen do the final step of generating the bindings including the generated header file. Does that sound like a good solution?

Aug 31 '22 17:08 pvdrz

Regardless of how this c code was generated. We would need to compile the c code first and then "re-run" bindgen

Why does the C-code need to be compiled before bindgen can be run? Why can't bindgen just generate the new header files first, and then generate the bindings? I'm a bit confused why compilation suddenly is necessary.

Aug 31 '22 18:08 jschwe

Regardless of how this c code was generated. We would need to compile the c code first and then "re-run" bindgen

Why does the C-code need to be compiled before bindgen can be run? Why can't bindgen just generate the new header files first, and then generate the bindings? I'm a bit confused why compilation suddenly is necessary.

Yeah you're right. I thought that compiling before would serve as a "sanity check" for the generated code but in principle bindgen should be able to generate bindings for the new header files without compiling the new source code beforehand.

Aug 31 '22 18:08 pvdrz

Ok I have a very hacky PoC here. The pipeline is be the following:

Save the builder original state in case we need to rerun it later (this is done in a hacky way using the command line arguments :grimacing:)
If the header files contain inlined function declarations and generate_inline_functions is enabled:
- A new header and source files are created with wrappers for all the inlined functions (if the function is foo it is wrapped as foo_extern).
- The builder original state is recreated but the generate_inline_functions option is disabled and the header inputs are updated to include generated header file.
- Binding generation is run again. If a function contains the __extern suffix, the canonical name is modified to remove it but the link name is respected.

From here the user can compile the generated source files into a library and use the println!("cargo:...") options to link the library.

I think this is a good approach as it can be done incrementally without a huge PR:

Split BindgenOptions so we can keep the actual user input immutably and we can use it later to rerun binding generation.
Add options to specify the directory and name of the generated header and source files and the suffix we use for the wrappers.
Add a generator for the header and source files. Mine is pretty naive but it only relies on the bindgen IR and has no extra dependencies. I think this can be handled incrementally if it is OK to bail out when code cannot be generated. I suppose some code generation is better than no code generation.
Modify the generate methods so we can re-run bindgen with the new headers. Also use the suffix to fix the generated function's names.
Update the user manual with an example on how to compile and link against the generated library.

There are alternatives here like:

Try to use something like clang_reparseTranslationUnit and include the new headers here to avoid parsing everything from scratch but I'm not sure if that would work.
Use an existing code generator or try to use the clang AST directly instead of the bindgen's IR.

I'd love to have some input before committing to any path in particular :)

Sep 02 '22 22:09 pvdrz