rust-bindgen Add enum option `constified_enum + rustified

All ways of interop with C enums unfortunately have some downsides. constified-enum, constified-enum-module, newtype-enum, and bitfield-enum can't be matched exhaustively: using the below demo, for example:

match some_foo {
    foo_one => println!("1"),
    foo_too => println!("2"),
    foo_three => println!("3"),
    _ => unimplemented!()
}

If a new variant is added to the enum, it gets swallowed with the _. Or a variant may have accidentally be emitted in the first place.

rustified_enum and rustified-non-exhaustive-enum provide more ergonomic solutions and are easier to match, but they don't have good handling for if C provides value not covered by a variant - which is allowed in C. This leads to bugs that can be impossible to track down.

Proposal: allow creating both a constified enum and a Rust enum, and autogenerate three conversion methods between them:

Safe panicking with Into
Safe but with an error with TryInto
Unsafe, assume you never get an unnamed value

Input C/C++ Header

enum foo {
  one = 1,
  two = 2,
  three = 3
};

Bindgen Invocation

bindgen test.h
bindgen test.h --rustified-enum '.*'

Actual Results

pub const foo_one: foo = 1;
pub const foo_two: foo = 2;
pub const foo_three: foo = 3;
pub type foo = ::std::os::raw::c_uint;

#[repr(u32)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum foo {
    one = 1,
    two = 2,
    three = 3,
}

Expected Results

Something like this:

pub const foo_one: foo = 1;
pub const foo_two: foo = 2;
pub const foo_three: foo = 3;
pub type foo_ctype = ::std::os::raw::c_uint;

#[repr(u32)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum foo {
    one = 1,
    two = 2,
    three = 3,
}

impl From<foo_ctype> for foo {
    fn from(value: foo_ctype) -> foo {
        match value{
            1 => foo::one,
            2 => foo::two,
            3 => foo::three,
            _ => panic!("unrecognized option for `foo` {foo_ctype}"),
        }
    }
}

struct FooError(foo_ctype);

impl TryFrom<foo_ctype> for foo {
    type Error = FooError;
    fn try_from(value: foo_ctype) -> Result<foo, FooError> {
        match value{
            1 => Ok(foo::one),
            2 => Ok(foo::two),
            3 => Ok(foo::three),
            _ => Err(FooError(value)),
        }
    }
}

impl foo {
    const unsafe fn from_ctype_unchecked(value: foo_ctype) -> Self {
        std::mem::transmute(value)
    }
}

All bindings would use foo_ctype as the value type, but this would give an easy way to turn it into something exhaustive

Sep 23 '23 01:09 tgross35

cc @ojeda

Sep 23 '23 01:09 tgross35

I think a good compromise would be a subset of what you mention:

pub type foo_ctype = ::std::os::raw::c_uint;

#[repr(u32)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum foo {
    one = 1,
    two = 2,
    three = 3,
}

impl TryFrom<foo_ctype> for foo {
    type Error = foo_ctype;
    fn try_from(value: foo_ctype) -> Result<foo, Self::Error> {
        match value {
            1 => Ok(foo::one),
            2 => Ok(foo::two),
            3 => Ok(foo::three),
            _ => Err(value),
        }
    }
}

Even though such thing could be generated by a procedural macro crate without issue.

The rest of it I wouldn't implement because:

Implementing From<foo_ctype> for foo is just not possible as TryFrom<T> has a blanket implementation for anything that implements From<T>. You can get the behavior you want by calling foo::try_from(value).unwrap() anyway.
The from_ctype_unchecked function is a wrapper over transmute with exactly the same safety preconditions and having it pollutes the module a bit. Also, you can use transmute directly anyway.
You can get the constants you included using the variants directly as foo::one as foo_ctype is a constant time operation.

Sep 23 '23 03:09 pvdrz

I forgot about the From/TryFrom conflict, thanks for pointing it out. Everything you said sounds reasonable to me, would you accept a PR to add that enum option?

A proc macro could indeed do this but I think it would have to be an attribute macro rather than derive since it needs the size context and has to add a separate trait. Is there a way to add attribute macros? I know know of adding derives

Sep 23 '23 07:09 tgross35

I forgot about the From/TryFrom conflict, thanks for pointing it out. Everything you said sounds reasonable to me, would you accept a PR to add that enum option?

I think we need to figure out the API for this first, but it sounds like a reasonable feature:

One option would be to have this as a completely independent option like --impl-try-from-for-enum (work is a name in progress:tm:) such that, when used in conjunction with --rustified-enum, will provide the TryFrom implementation,
Another option would be just literally add a new enum variation like --rustified-enum-with-try-from.

The issue I have with the second option is that potentially someone could want to implement this for newtype enums or non-exhaustive, rustified enums. So I'd favor the first option in that case but I'm open to discussion.

A proc macro could indeed do this but I think it would have to be an attribute macro rather than derive since it needs the size context and has to add a separate trait. Is there a way to add attribute macros? I know know of adding derives

I agree, i think an attribute macro would be more appropriate. Adding attribute macros is not that different from a derive macro iirc: https://doc.rust-lang.org/reference/procedural-macros.html#attribute-macros

This could be an option if you want to add this quickly to rust-for-linux instead of waiting for this feature to be implemented and released in bindgen and then update the bindgen version on your build scripts (as I know that y'all don't bump your bindgen version that often)

Sep 25 '23 04:09 pvdrz

Even if a procedural macro is able to do it, bindgen is already a code generator, so it seems better to just perform the generation. That simplifies the build process for users and should be faster to compile, no? Unless the idea is that users may customize that proc macro to do whatever they need, but that could be an independent feature.

On From vs. TryFrom: I guess it depends on the project and each type and whether they consider the type will always be perfectly convertible.

On implementing some of the methods or not: I agree TryFrom may be enough, especially if rustc/LLVM optimize it properly together with unwrap_unchecked and unwrap (though see e.g. https://github.com/llvm/llvm-project/issues/48319 for a case around unwrap_unchecked in the past).

Having said that, if the better way is to use transmute instead of unwrap_unchecked (e.g. if it does not optimize as well in some cases), then I think it is a good idea to provide at least the transmute wrapper, because I see transmute as a lower-level tool. Even if it is just calling transmute, the from_x_unchecked constructor has a constrained signature. For that reason, I also think the safety preconditions can be easier to understand.

Moreover, bindgen could generate the # Safety section in the documentation for the function, giving even the range of valid values and so on, even condensing them as ranges where possible. It could even write a few examples too (e.g. first variant, last variant, UB example...).

Sep 25 '23 16:09 ojeda

Even if a procedural macro is able to do it, bindgen is already a code generator, so it seems better to just perform the generation. That simplifies the build process for users and should be faster to compile, no? Unless the idea is that users may customize that proc macro to do whatever they need, but that could be an independent feature.

Yeah I think this is a good alternative if you need more fine tuning in the future.

On From vs. TryFrom: I guess it depends on the project and each type and whether they consider the type will always be perfectly convertible.

Hmm... I would consider a From implementation that can panic a bad practice in general and I wouldn't be comfortable providing such option. Specially since you can properly document such conversions by doing EnumType::TryFrom(c_value).expect("I know what I'm doing")

On implementing some of the methods or not: I agree TryFrom may be enough, especially if rustc/LLVM optimize it properly together with unwrap_unchecked and unwrap (though see e.g. llvm/llvm-project#48319 for a case around unwrap_unchecked in the past).

I suspect that this optimization is less prone to break for fieldless enums in comparison to the Option<T> example you mention in the issue itself but I might be wrong

Having said that, if the better way is to use transmute instead of unwrap_unchecked (e.g. if it does not optimize as well in some cases), then I think it is a good idea to provide at least the transmute wrapper, because I see transmute as a lower-level tool. Even if it is just calling transmute, the from_x_unchecked constructor has a constrained signature. For that reason, I also think the safety preconditions can be easier to understand.

I think adding this from_ctype_unchecked as a method of the enum would be OK. However, I'm not sure if I would add this under the same flag/option or not as some people might want the TryFrom but not the from_ctype_unchecked and vice versa.

Moreover, bindgen could generate the # Safety section in the documentation for the function, giving even the range of valid values and so on, even condensing them as ranges where possible. It could even write a few examples too (e.g. first variant, last variant, UB example...).

See my previous comment.

Sep 25 '23 17:09 pvdrz

I would consider a From implementation that can panic a bad practice in general

Yeah, it would be bad to do so unconditionally for everything (when I suggested the overall idea in Zulip, I was thinking of having a constructor, rather than implementing From), but it may depend on the case, e.g. there is From<&str> for String which aborts on oom, right? I guess some projects could similarly say for particular types "if we make a mistake here, we are happy to consider that a panic/abort/...".

However, I'm not sure if I would add this under the same flag/option or not as some people might want the TryFrom but not the from_ctype_unchecked and vice versa.

Agreed, having the option to choose here would be very nice. In fact, doing so per enum (like how the "kind" of the enum can already be chosen per enum) would be nice too.

Sep 25 '23 18:09 ojeda

I think the main "blocker" here is the CLI API for these two options. The library API is more or less clear:

pub enum EnumVariation {
    Rust {
        non_exhaustive: bool,
        try_from_raw: bool,
        from_raw_unchecked: bool,
    },
    NewType {
        is_bitfield: bool,
        is_global: bool,
    },
    Consts,
    ModuleConsts,
}

One option would be to copy the --override-abi flag syntax which is --override-abi=<REGEX>=<ABI> and allow several comma separated options as a suffix. Something like --rustified-enum=<REGEX>=(<OPTION>,)* where <OPTION> can be non_exhaustive, try_from_raw or from_raw_unchecked.

Oct 03 '23 17:10 pvdrz

@ojeda Taking a look at this, and I just want to make sure I'm understanding the final request after the discussion.

As I'm understanding it, we just want a TryFrom and a wrapper for transmute, is that correct? We should be able to enable both separately in the Rust enum cases rustified_enum and rustified-non-exhaustive-enum. I think you're also saying that it would be best to allow enabling per enum. Is that correct?

Aug 06 '24 17:08 jbaublitz

As I'm understanding it, we just want a TryFrom and a wrapper for transmute, is that correct?

A safe, fallible way may be enough, but it depends on the codegen when used with e.g. unwrap_unchecked(), and even if it always worked well, I think it is still nice to have the unsafe version, just like e.g. Result and Option have them too.

I think you're also saying that it would be best to allow enabling per enum. Is that correct?

It depends on what the maintainers want to do here, but since other things can be customized per-enum, it may make sense to allow that per-enum too, I am not sure.

My (possibly naïve) idea for the kernel is that, when we have this, we will just enable it for all enums there (with both the safe, fallible and the unsafe versions) and that's it. In other words, we would just have a single "kind" of enum (this one -- which I would argue should be the default in bindgen eventually), and we would only use the "both fallible, safe and unsafe versions" of its generation. So I am not sure if we will have a use case in the kernel for customization of either the "kind" or the methods to generate.

Aug 06 '24 18:08 ojeda

Okay! I'll start with an implementation of independent safe and unsafe wrappers and have the policy be applied to all enums and see what the maintainers think.

Aug 06 '24 19:08 jbaublitz

Add enum option `constified_enum + rustified_enum` with conversions

Input C/C++ Header

Bindgen Invocation

Actual Results

Expected Results