prost icon indicating copy to clipboard operation
prost copied to clipboard

`OpenEnum`: an enum to represent protobuf's enumeration field values

Open mzabaluev opened this issue 1 year ago • 22 comments

Resolves #276 in a way amenable to destructuring.

Introduce opt-in support for type-checked enumeration fields, selectively enabled for message fields matched by patterns given in Config::typed_enum_fields methods.

For the matching fields, the type of the generated Rust message field is not i32, but depends on the syntax of the enum definition:

For enums defined in files with proto2 syntax, the representation is a closed enum, that is, the generated Rust enum type. On decoding, unknown values of the field are ignored.

For enums defined in files with proto3 syntax, the representation is a generic OpenType wrapper, parameterized with the generated enum type:

pub enum OpenEnum<T> {
    /// A known value defined in the proto
    Known(T),
    /// Unknown value as decoded from the wire
    /// (uses an opaque i32 wrapper to prevent arbitrary construction)
    Unknown(Unknown),
}

When Protobuf editions are supported (see #1031), these representations will support the enum_type feature behavior for closed and open enums, respectively.

In an improvement over #1061, this allows convenient matching of enum field values as part of the message. There are also convenience methods to fallibly extract the known value in an Option or Result.

mzabaluev avatar May 28 '24 11:05 mzabaluev

Please add a description that explains your proposed change. That will make reviewing easier.

caspermeijn avatar May 28 '24 14:05 caspermeijn

Do you think that this logic could be applied to the oneof field ?

QuentinPerez avatar May 28 '24 23:05 QuentinPerez

Do you think that this logic could be applied to the oneof field ?

I believe it works differently for oneofs: if an unknown field number is encountered in a message and it's not a known oneof variant or a regular field, the field is ignored. So the oneof field that would get the value if the variant field were described in the proto would get None instead.

mzabaluev avatar May 29 '24 12:05 mzabaluev

Will this break compatibility with proto2/closed enums? Personally, I don't use proto2, so I am not sure whether it is properly supported at all.

caspermeijn avatar May 31 '24 08:05 caspermeijn

I think it makes sense to provide a migration guide.

caspermeijn avatar May 31 '24 08:05 caspermeijn

What is the error message for a i32 field with #[prost(enumeration)]? Can we detect that scenario and print a nice error message?

caspermeijn avatar May 31 '24 08:05 caspermeijn

I would like to see some tests for OpenEnum.

caspermeijn avatar May 31 '24 08:05 caspermeijn

Will this break compatibility with proto2/closed enums? Personally, I don't use proto2, so I am not sure whether it is properly supported at all.

I believe prost has never been conformant with closed enums: the raw values have been left in the message as is. I think we could try to support closed enums in two ways:

  1. Always represent an enum field as OpenEnum, but in the closed enum case, decode unknown values as Known(Default::default()). This leaves the possibility of producing unknown values on encoding.
  2. Represent closed enums with their generated Rust enum type. Decode unknown values as Default::default().

I prefer option 2, even though it requires more work. Protobuf edition 2023 has closed enums as a feature, so they need to be supported even if we ignore proto2.

mzabaluev avatar Jun 02 '24 06:06 mzabaluev

I think it makes sense to provide a migration guide.

What would be a good place for it? I can add a section to the README.

mzabaluev avatar Jun 02 '24 07:06 mzabaluev

I would like to see some tests for OpenEnum.

I'm going to add at least one good example/doctest on the type.

mzabaluev avatar Jun 02 '24 07:06 mzabaluev

Have thought some more about this PR: I don't want to break users in this way. At least not now.

I suggest making this an option in prost-build so that we can experiment with the API without breaking existing users. That way, interested users can opt in, and we don't have to create a perfect OpenEnum API on the first try.

Once we feel good about the new API, we can think about changing the default.

Please look at bytes for a good example of changing the generated data type.

caspermeijn avatar Jun 07 '24 11:06 caspermeijn

Instead of introducing a new type, a simple Result<T, i32> sounds more logical to me. enum-repr-derive follows this approach when parsing an enum from i32

ArjixWasTaken avatar Jun 28 '24 19:06 ArjixWasTaken

Instead of introducing a new type, a simple Result<T, i32> sounds more logical to me.

It's weird to have a Result as a struct field. The names of variants and methods of Result are less than intuitively applicable: it's not necessarily an error to receive an unknown enum value from the wire, so we should give the API users a speed bump to decide how to deal with them. There are convenience methods and the TryInto impl to convert the OpenEnum value to a Result if that's the chosen approach.

Another benefit of introducing a new type is for the add-on macros and code generators that derive something on structs generated by prost-build. These could deal with the OpenEnum type in some specific way, even without access to the proto descriptor data for the field. Doing the same (e.g. defining generic trait impls) for Result would feel like too much overloading.

mzabaluev avatar Jun 29 '24 13:06 mzabaluev

How are default values handles in this solution? Especially default values set for a specific field in the proto file.

caspermeijn avatar Aug 09 '24 06:08 caspermeijn

How are default values handles in this solution? Especially default values set for a specific field in the proto file.

Good question. Maybe there should be a const generic parameter for this, but it will then crop up everywhere.

Updated: explained below

mzabaluev avatar Nov 12 '24 03:11 mzabaluev

The generated getter methods handled per-field defaults for optional fields. For non-optional fields, the default is initialized before merging field values from the wire.

I'm not sure how best to proceed here. Suppose we have this generated message:

pub struct Msg {
    #[prost(enumeration = "AnEnum", optional, default = "AnEnum::B")]
    pub field: Option<OpenEnum<AnEnum>>
}

It would be nice to have API implementing the default value behavior for proto2 and edition 2023.

The protobuf guide recommends a getter method that returns the default value, but whether out-of-range values are exposed is up to the enum_type protobuf feature. The "hazzer" method can be generated to check presence, but in the currently produced code one can just use Option::is_some.

mzabaluev avatar Nov 13 '24 09:11 mzabaluev

I suggest making this an option in prost-build so that we can experiment with the API without breaking existing users. That way, interested users can opt in, and we don't have to create a perfect OpenEnum API on the first try.

I have addressed this and most of the other comments in recent commits. There is now a Config::typed_enum_fields method to opt into type-checked fields.

The proto2 behavior (and the closed enums feature in future editions) is also implemented for enums, albeit with some current limitations: state available to the code generator does not track which syntax the enum types were defined in, that is needed for proto2 ← proto3 includes. I'll try to resolve this.

Getter methods are retained for the optional fields for convenience and to implement per-field default values. Protobuf guidelines also insist on there being getters returning non-nullable values.

mzabaluev avatar Nov 17 '24 20:11 mzabaluev

@caspermeijn please remove the breaking change label as the revised implementation is purely opt-in.

mzabaluev avatar Nov 29 '24 08:11 mzabaluev

The proto2 behavior (and the closed enums feature in future editions) is also implemented for enums, albeit with some current limitations: state available to the code generator does not track which syntax the enum types were defined in, that is needed for proto2 ← proto3 includes. I'll try to resolve this.

This has been implemented in https://github.com/tokio-rs/prost/pull/1079/commits/b6e2b0beca62c5b9e7265e6a2bb54902374117a6 I should add a test for the enum type "stickiness" across includes soon.

mzabaluev avatar May 30 '25 20:05 mzabaluev

One troublesome thing in this design is the public integer in the Unknown variant, which can be assigned arbitrary values including those of known variants. This can be treated as garbage in, garbage out, but it may be cleaner to use something like UnknownEnumValue and make the wrapped integer private so that bogus values can't be constructed by mistake. This will make matching on unknown values more cumbersome, but that seems like an uncommon use case.

mzabaluev avatar Jun 07 '25 18:06 mzabaluev

@mzabaluev could you update the PR's description to be more detailed?

I see a distinction between Open/Closed/Int enums, and you've said that this entire feature is now opt-in, I believe such details need to be advertised at the description of the PR and not have people read all the comments.

e.g.

  1. How do you opt-in? 1.5 Can you mix different enum types together? (aka is it global or not)
  2. What enum types does this introduce?
  3. Is a "closed" enum w/o a wrapper? (does it throw an error when parsing if it's invalid)

And other stuff you may find important to point out

PS: Thanks for keeping this PR alive for so long, I really really appreciate that as a user of this library.

ArjixWasTaken avatar Jun 08 '25 22:06 ArjixWasTaken

@ArjixWasTaken I have updated the description, thank you for the suggestion.

mzabaluev avatar Jun 09 '25 11:06 mzabaluev