Support `repr(C)` for enums
Problem
Bindgen supports different enum representations which are either marked repr(transparent) or repr($ty) where $ty is the representation used for the enum (an integer type or an alias for one). However, for cross-language CFI to work correctly, it is sometimes desirable to use repr(C) instead as it causes the compiler to emit the right type ID for cross-language CFI to work.
Example:
enum handler_return {
INT_NO_RESCHEDULE = 0,
INT_RESCHEDULE,
};
gives us the following bindings (using --rustified-enum handler_return):
// in context of cross-language CFI, the return type is `core::ffi::c_uint` but
// the expected type is `enum handler_return`) thus an indirect function
// call using this type will abort
#[repr(u32)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum handler_return {
INT_NO_RESCHEDULE = 0,
INT_RESCHEDULE = 1,
}
The following two variations work:
// this works but isn't supported by bindgen as far as I can tell
// (e.g. can't pass `repr(c)` using `--with-attribute-custom-enum ` flag)
#[repr(C)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum handler_return {
INT_NO_RESCHEDULE = 0,
INT_RESCHEDULE = 1,
}
// this also works but requires the user to know about the
// C++ Itanium ABI/cross-lang CFI internals - not ideal...
#[repr(u32)]
#[cfi_encoding = "14handler_return"]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum handler_return {
INT_NO_RESCHEDULE = 0,
INT_RESCHEDULE = 1,
}
Using --newtype-enum handler_return fails the CFI check too:
// fails cfi check
#[repr(transparent)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub struct handler_return(pub core::ffi::c_uint);
and repr(C) or cfi_encoding = "14handler_return" attributes make the CFI check pass again.
Potential Solutions
- Add new flags to request
repr(C)for enums, e.g.--rustified-enum-repr,--newtype-enum-repr, etc. - Allow
--with-attribute-custom-enum REGEX=#[repr(C)] - No action; require the user to use
#[cfi_encoding]or change the LLVM cross-language CFI implementation.
The first solution complicates the bindgen CLI but the functionality is easily discoverable. The second solution makes the inverse tradeoff: more can be done with the existing flags but few people will discover that this is the case.
I have prototyped the second approach and can put up a draft PR unless there's a feeling it is not the right way to go. Also very interested in hearing about potentially better solutions I just haven't thought of.
cc @rcvalle @maurer
(Edited to fix mistake pointed out by @emilio in this comment.)
gives us the following bindings (using
--rustified-enum handler_return):
That doesn't seem correct? That's what I get with that flag:
#[repr(u32)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum handler_return {
INT_NO_RESCHEDULE = 0,
INT_RESCHEDULE = 1,
}
Which seems like it'd work?
// this works but isn't supported by bindgen as far as I can tell // (e.g. can't pass `repr(c)` using `--with-attribute-custom-enum ` flag) #[repr(C)] #[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)] pub struct handler_return(pub core::ffi::c_uint);
Hmm, that doesn't seem identical / correct ABI-wise... Structs and enums are passed in different ways, right?
Does the enum representation work for CFI? Or do we somehow need:
#[repr(C)]
enum handler_return { ... }
I guess not since according to https://doc.rust-lang.org/nomicon/other-reprs.html#reprc:
repr(C) is equivalent to one of repr(u*) (see the next section) for fieldless enums
Cc @rcvalle
gives us the following bindings (using
--rustified-enum handler_return):That doesn't seem correct? That's what I get with that flag:
#[repr(u32)] #[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)] pub enum handler_return { INT_NO_RESCHEDULE = 0, INT_RESCHEDULE = 1, }
I'm very sorry; I mixed things up. I've (hopefully) corrected my original comment. Thanks for the catch!
Which seems like it'd work?
Unfortunately, repr(u32) does not work in my testing.
Does the
enumrepresentation work for CFI? Or do we somehow need:#[repr(C)] enum handler_return { ... }
In my testing, we need repr(C) for the enum representation to work.
I guess not since according to https://doc.rust-lang.org/nomicon/other-reprs.html#reprc:
repr(C) is equivalent to one of repr(u*) (see the next section) for fieldless enums
I think there are multiple equivalences in play. One is layout equivalence, another is what is considered equivalent in the Itanium C++ ABI which in turn determines what pointers are equivalent for the purposes of LLVM control-flow integrity
I'm relying on this blog post by @rcvalle:
Clang uses the Itanium C++ ABI's virtual tables and RTTI typeinfo structure name as type metadata identifiers for function pointers.
Moreover, the Itanium C++ ABI encoding for Rust structs, enums, and unions have the same encoding: <length><name> (e.g. 14handler_return in my example). I assume that is why both the newtype and rustified-enum bindgen options pass CFI checks with repr(C).
Ok, I guess then the questions would be:
- Does CFI work if you use enum : uint32_t on the C++ side?
- If repr(C) works on the enum, not a struct, the nomicon should be fixed to reflect that subtle difference.
If there's a good way of getting the cfi_encoding from clang, I think the best approach to address this would be to get it from clang and explicitly specify it in the bindgen output.
Ok it seems #[repr(transparent)] structs can also specify cfi_encoding properly. So the way I'd go about fixing this would be:
- Adding a libclang function to expose the CFI encoding of a given type, just like there are functions for mangled names and so on.
- Consuming tha in bindgen, and add
#[cfi_encoding]based on that, probably behind an option, and ideally only when needed (though if that's hard to detect that seems fine.
Seems that'd work for all enum representations except the typedef ones, which should be documented but...
And as an interim step we could try to add #[repr(C)] on enums if there's an easy way to get the info from clang that this enum is not specifying its width.
The encoding is well defined in both sides (i.e., clang and rustc) and there is a specification for the encoding for cross-language CFI support: https://rcvalle.com/docs/rust-cfi-design-doc/ so adding a function for and calling into clang (or using tools such as c++filt and llvm-cxxfilt) shouldn't be necessary as it should just work. We could either:
Automatically add the cfi_encoding for enums following the spec as <length><name>, or add an option to add the #[repr(C)] attribute to enums. If we chose the later, we should probably also verify whether this should be amended in the Nomicon as well.
I implemented the #[repr(C)] approach in #3265 but I was unsure about this bit:
on enums if there's an easy way to get the info from clang that this enum is not specifying its width.
Do you mean C23 enums that declares its underlying type? e.g.
enum foo: uint32_t {
/* ... */
};
Yes, or c++ enums.
on enums if there's an easy way to get the info from clang that this enum is not specifying its width.
If the enum specifies its underlying integer type, getIntegerTypeSourceInfo returns non-null. However, it appears that method is not exposed by clang_sys; presumably because it is not part of the C API. Not sure how much work it is to have it exposed.
getIntegerTypeRange is another way to detect when the underlying type was specified but it too is missing from the C API.