How to avoid bindgen being out of sync with cc?
My understanding is, that it is very common to wrap a C library in a Rust *-sys crate by using bindgen to automatically generate Rust bindings and cc to compile the C code.
As far as I understand it bindgen uses libclang to parse the C headers and cc uses whatever compiler the user provides.
This can lead to problems, because both toolchains don't need to agree on everything. While cross compiling for thumbv6m-none-eabi on Windows, bindgen generated u32 for enums and cc used u8 when possible (-fshort-enums).
This is quite unfortunate and hard to catch. Is there a common workaround for this?
PS: If there is a better place for this issue please tell me.
the easiest I can think of is to set the relevant environment variables for clang-sys and use the same paths when calling Build::compiler
The docs for Build::compiler also say that the compiler is automatically detected from a number of environment variables so it might be that setting CC or similar to be consistent with the clang-sys environment variables might work as well.
I also ran into the problem recently that I had generated bindings that were incorrect because of enum size differences; similarly on a thumbX-none-eabi project, although in my case I wasn't using cc but instead linking with separately compiled code as a separate step from the cargo build.
I had this idea for a "this would've prevented my mistake" feature: when bindgen encounters an enum, if it detects that the target platform is one that common C compilers disagree on the size of enums[^1] and you haven't manually specified -fshort-enums or -fno-short-enums it could issue a warning to double check that the bindings are correct.
A warning seemed like the most useful UX here because, at least for my use case, bindgen couldn't know that I was planning to link with code compiled by gcc, but the warning would've prompted me to figure out that the bindings were wrong before I learned so the hard way.
[^1]: likely most critically: -none-eabi arm platforms
likely most critically: -none-eabi arm platforms
ARM defers enum ABI choice to platform ABI, but of course *-none-* has no platform ABI. As @geeklint discovered (see: godbolt & Rust GameDev Discord Discussion), GCC chooses a 1-byte repr, Clang chooses a 4-byte repr, for a simple:
typedef enum { Hello = 1 } Test;
Solutions that have run through my mind include:
-
Making bindgen (optionally?) generate a
bindings.cppto feed toccfull ofstatic_asserts validating alignment/size/signedness/???. Probably the best option for catching differences between bindgen andcc, although it doesn't necessairly do anything to fix caught issues. I've done similar for manually generated FFI and it's helped. -
For enums specifically, making a miserable pile of
cc-driventypes and using that to implementbindings.rsinstead ofcore::ffi::c_intetc. - I implemented the first 90% of this asabienum, although the build.rs probably needs to export metadata to support#[repr(u32)] enum Test { ... }style enums. The second 90%, actually modifying bindgen to use something likeabienum, is left as an exercise to the reader. The third 90% has yet to be identified. This also doesn't help if you don't specify${CC}to tellccto use GCC when linking prebuilt libs that were built with GCC. -
Trying to upstream
core::ffi::c_enum_*somehow. This would require makingrustcaware of${CC}/${CFLAGS}/ ???, since compilers for the "same" target disagree on layout, which seems like something that would have a lot of pushback (although I could be wrong.) -
Taint generated enums such that they're
improper_ctypessomehow, at least on unknown/none style platforms.
I just finished debugging extremely naughty issue because GCC generated enum of size 2 while bindgen of size 4. The behavior isn't consistent either, in the same build, GCC picked size 4 for relatively similar enums. -fno-short-enums is a solution too.
That suggestion with static asserts compiled with GCC would've helped and prevented the issue.