Generate C code to export static inline functions
https://github.com/intelxed/xed has a bunch of inline functions as part of its interface. In order to make the xed bindings usable (https://github.com/valarauca/xed-sys/issues/1) we need to export them somehow. If bindgen could generate C code that just wraps the static inline functions we would be able to use bindgen to generate bindings to that code which would solve the problem.
For now, you can also compile it with -fno-inline-functions, I think.
Won't that prevent inlining inside xed?
Further, -fno-inline-functions and .generate_inline_functions(true) doesn't actually seem to generate bindings for those functions. Perhaps because they are static inline?
Won't that prevent inlining inside xed?
Yup, I think so, which is unfortunate, I agree.
Further, -fno-inline-functions and .generate_inline_functions(true) doesn't actually seem to generate bindings for those functions. Perhaps because they are static inline?
Yeah, if they aren't visible in other translation units we can't generate symbols for them... I guess wrapping them is the best solution then
What's the best approach to implement this?
It looks like currently static inline functions end up in the ir with a name of None.
-fkeep-inline-functions should allow the functions to remain inlined at call sites, but still give us something to link to, IIUC.
This feature is something I've wanted for a while, but haven't had the need to actually implement. The biggest open questions I have are:
-
How are we going to integrate this feature with our test suite? Emit
tests/expectations/tests/whatever.cnext totests/expectations/tests/whatever.rs? -
What will the user ergonomics be like? Can we compile the emitted C for the user automatically from
build.rs? As much as possible, I'd like to avoid users having to manually add new dependencies on things likegccorcmakecrates and add even more stuff to theirbuild.rs.
What's the best approach to implement this?
A new pass over the final IR, probably called from inside ir::context::BindgenContext::gen right before or after our existing codegen. This should probably live in a new module next to codegen and ir named cgen or inlgen something like that but better :-p
It would be awesome if there was a c_quote! { ... } macro... but that's a pretty tall order if one doesn't exist already.
It looks like currently static inline functions end up in the ir with a name of None.
Unfortunately, I don't have any insight into why this might be off the top of my head.
It looks like currently static inline functions end up in the ir with a name of None.
Unfortunately, I don't have any insight into why this might be off the top of my head.
This is fixed with https://github.com/rust-lang-nursery/rust-bindgen/pull/1091
I'm probably not going to look at this again anytime soon as I've switched from using https://github.com/valarauca/xed-sys/ to https://github.com/zyantific/zydis-rs which doesn't have this problem.
static inline functions are still not generated with the option enabled when I tested on mbedTLS.
We have this trouble on YJIT quite extensively - commenting and subscribing here :-)
With the recent advancements in C2Rust, the #1344 route for this
is becoming more viable again. (Or maybe even going all C2Rust -- in a
sense, bindgen is "just" what C2Rust should be when let lose on a .h
files, albeit with many more bells and whistles).
Just using C2Rust might not work well for us - one thing we need is more support for complex preprocessor defines than bindgen gives (e.g. operators on constants in a define.) In general, we use a lot of C preprocessor constants. Could work well for somebody else, though! Some method of integrating the two tools could be interesting. But it also sounds like it doesn't really exist yet. So we'd need to do a significant amount of tools work on our large existing codebase to make it happen. Not impossible, but not on our short-term roadmap, I don't think.
I was thinking about tackling this issue. However, I'd like to understand a bit better what needs to be done first. Let's say we have a header file header.h like this:
static inline int foo();
In my head what would happen is that we would generate an intermediate c file:
#include <header.h>
int foo_wo_inline() {
return foo();
}
and then this file would be processed by bindgen and produce something like this:
extern "C" {
fn foo_no_inline() -> ::std::os::raw::c_int;
}
pub unsafe extern "C" fn foo() -> ::std::os::raw::c_int {
foo_no_inline()
}
Despite of the name juggling, is this a good way to approach this issue?
@pvdrz That sounds basically like what we are currently doing manually in our project. Some comments:
- It might be better to generate a C file and a header file. Bindgen just needs to process the new header file.
- The generated C file should be put into a configurable location, since it needs to be integrated into the C build system - otherwise the new symbol will not be present.
On the Rust side I think using link_name like this might be an improvement:
extern "C" {
#[link_name = "foo_no_inline"]
fn foo() -> ::std::os::raw::c_int;
}
I started taking a deeper look at this today and I have a couple questions (@emilio, @fitzgen):
-
I was thinking on implementing this
c_quote/cgenfunctionality at least to be able to generate the headers and extra C code we need. Would it be OK for you if this was done via a trait implementation for the elements of the IR we need? It would be something like:trait Emit { fn emit<W: io::Write>(&self, ctx: &BindgenContext, buf: W); } -
If we generate this extra C code in
BindgenContext::genhow do we getbindgento process these new files and merge the output with the already existing output from the original header file? Should we create a newBindgenContextor even a newBuilder, process the extra code and then merge the resulting items?
Update: I have a very incomplete but working code generator for C and is good enough to emit the wrapper functions we need. The only thing I need to move forward is knowing how to handle these "two stages" where we first find all the static inline functions and then run bindgen again over the generated files.
I've done some progress this week.
Right now when a header file has inline functions like this one:
typedef int num;
static inline num foo(num arg) { return arg + 42; }
inline num bar(num arg) { return arg + 0x45; }
num baz(num arg);
the following is printed to stdout:
[bindgen-tutorial 0.1.0] cargo:rerun-if-changed=new_wrapper.h
[bindgen-tutorial 0.1.0] FOUND INLINED FUNCTION: foo
[bindgen-tutorial 0.1.0] FOUND INLINED FUNCTION: bar
[bindgen-tutorial 0.1.0] HEADERS
[bindgen-tutorial 0.1.0] headers: int foo__extern(int arg);
[bindgen-tutorial 0.1.0] headers: int bar__extern(int arg);
[bindgen-tutorial 0.1.0] WRAPPERS
[bindgen-tutorial 0.1.0] wrappers: int foo__extern(int arg) { return foo(arg); }
[bindgen-tutorial 0.1.0] wrappers: int bar__extern(int arg) { return bar(arg); }
@emilio I have a question before going any further on this issue.
It seems the most robust way of generating the necessary rust code for inlined functions is using bindgen to process the generated c code that wraps the inlined functions.
Regardless of how this c code was generated. We would need to compile the c code first and then "re-run" bindgen over it respecting all the BindgenOptions that were set by the user.
The only way to do this that I can think of is making BindgenOptions: Clone, clone the options and reuse them to "re-run" bindgen. But that's not possible right now because BindgenOptions hold non-clonable state that isn't actually provided by the user a options.
So the question is: Would it make sense to split BindgenOptions into the actual user input and the internal state, clone this user input and re-run bindgen to process the new header files?
Regardless of how this c code was generated. We would need to compile the c code first and then "re-run" bindgen over it respecting all the
BindgenOptionsthat were set by the user.
I'd be careful about compiling code ourselves (because the CFLAGs we see might not be fully accurate for compilation, eg. because C code is compiled by GCC, and the options were stripped down to support LLVM). At least in my use case (riot-sys), it would be prefereable to have the C file exported somewhere where the remaining build system could pick it up and build it together with the rest of the C code.
Regardless of how this c code was generated. We would need to compile the c code first and then "re-run" bindgen over it respecting all the
BindgenOptionsthat were set by the user. I'd be careful about compiling code ourselves (because the CFLAGs we see might not be fully accurate for compilation, eg. because C code is compiled by GCC, and the options were stripped down to support LLVM). At least in my use case (riot-sys), it would be prefereable to have the C file exported somewhere where the remaining build system could pick it up and build it together with the rest of the C code.
I think that one way to solve this would be to add some sort of callback so the user can specify "how" to compile this generated c code and then bindgen do the final step of generating the bindings including the generated header file. Does that sound like a good solution?
Regardless of how this c code was generated. We would need to compile the c code first and then "re-run" bindgen
Why does the C-code need to be compiled before bindgen can be run? Why can't bindgen just generate the new header files first, and then generate the bindings? I'm a bit confused why compilation suddenly is necessary.
Regardless of how this c code was generated. We would need to compile the c code first and then "re-run" bindgen
Why does the C-code need to be compiled before bindgen can be run? Why can't bindgen just generate the new header files first, and then generate the bindings? I'm a bit confused why compilation suddenly is necessary.
Yeah you're right. I thought that compiling before would serve as a "sanity check" for the generated code but in principle bindgen should be able to generate bindings for the new header files without compiling the new source code beforehand.
Ok I have a very hacky PoC here. The pipeline is be the following:
- Save the builder original state in case we need to rerun it later (this is done in a hacky way using the command line arguments :grimacing:)
- If the header files contain inlined function declarations and
generate_inline_functionsis enabled:- A new header and source files are created with wrappers for all the inlined functions (if the function is
fooit is wrapped asfoo_extern). - The builder original state is recreated but the
generate_inline_functionsoption is disabled and the header inputs are updated to include generated header file. - Binding generation is run again. If a function contains the
__externsuffix, the canonical name is modified to remove it but the link name is respected.
- A new header and source files are created with wrappers for all the inlined functions (if the function is
From here the user can compile the generated source files into a library and use the println!("cargo:...") options to link the library.
I think this is a good approach as it can be done incrementally without a huge PR:
- Split
BindgenOptionsso we can keep the actual user input immutably and we can use it later to rerun binding generation. - Add options to specify the directory and name of the generated header and source files and the suffix we use for the wrappers.
- Add a generator for the header and source files. Mine is pretty naive but it only relies on the bindgen IR and has no extra dependencies. I think this can be handled incrementally if it is OK to bail out when code cannot be generated. I suppose some code generation is better than no code generation.
- Modify the
generatemethods so we can re-run bindgen with the new headers. Also use the suffix to fix the generated function's names. - Update the user manual with an example on how to compile and link against the generated library.
There are alternatives here like:
- Try to use something like
clang_reparseTranslationUnitand include the new headers here to avoid parsing everything from scratch but I'm not sure if that would work. - Use an existing code generator or try to use the
clangAST directly instead of the bindgen's IR.
I'd love to have some input before committing to any path in particular :)