prost Support for custom options and proto2 extensions

Hi, this PR adds support for extensions (proto2 only) and custom options (proto2 and 3).

Custom options are built on top of extensions, so I'll refer to them collectively as "extensions".

Please see the included documentation and tests for examples of how prost users would use these features.

Overview

This code is roughly patterned after how extensions are handled in other languages generated by protoc itself.

Here's a board which explains the flow of custom options. Extensions are a little simpler since they're just accessible in the final generated code instead of at parse time, but the overall picture similar.

protobuf-extensions-discovery

The core functionality lives in src/extension.rs:

Extendable
- Applied to proto messages marked with extensions n to m; via code generation.
- Provides functions for working with the underlying ExtensionSet.
ExtensionSet
- Contains a set of decoded values in a type-opaque manner through ExtensionValue.
- Its methods take ExtensionImpl<T> which can be used to manage the data of an ExtensionValue by downcasting it to an ExtensionValueImpl<T>.
Extension/ExtensionImpl
- const definitions are generated in rust code from the proto extend MessageName {...} defintions.
- These definitions are used with the Extendable/ExtensionSet to provide type information.
ExtensionRegistry
- This is a temporary structure that you load the generated const Extensions into before parsing.
- It provides the information to the parser necessary to decode values into the type-erased ExtensionSets

The other major file src/generic.rs provides traits Merge, MergeRepeated, Encode, and EncodeRepeated, which exist exclusively to map rust types to the various functions in prost::encoding. This allows merging and encoding to work for the ExtensionSet fields that won't have prost derive attributes for each of field it needs to decode.

The remaining changes are mostly implementing the above traits, tests, documentation.

Notes

This PR makes additive changes to prost_types::protobuf.rs which adds to all *Options types:
- derive ::prost::Extendable.
- a pub extension_set field. :warning: This will break for anyone creating these types directly through e.g. FileOptions{...}.
- an impl that defines a const EXTENDABLE_TYPE_ID.
This PR has almost zero runtime impact on anyone not using extensions.
- The only effect is the various *Options take an additional Option<Box<_>> bytes.
I ran into a build issue on which seems to have been encountered by others (#556). That is the change to prost-build/Cargo.toml.
I am happy to provide maintenance for this area of code in the future.

A nice side effect of this change is prost could define its own custom options for use in compile_protos e.g. the ability to tag specific fields to generate using small_vec.

Related issues

Resolves #256 Resolves #100

Feb 05 '22 23:02 nswarm

Thanks @neoeinstein for some great usability changes!

Feb 20 '22 16:02 nswarm

Would be interested to see how this could be used to, for instance, add an attribute (such as a proc-macro) to a message or a field to modify the code generation behavior.

Operating inside the prost lib, we could define prost-level custom options that could be explicitly loaded and referenced during code generation and add the relevant rust attribute. That could even be a user defined macro, for instance we could directly inject the name into the attr list:

option (prost_proc_attr) = "::mycrate::MyAttr";

This would be akin to the protoc plugin insertion points in google's generators, although wouldn't scale very well for code blocks.

Is that the kind of thing you mean?

Feb 21 '22 16:02 nswarm

That's part of it. Inside prost, I'm thinking about things like indicating a message or field should be a foreign type, providing specified names, etc. Below is a series of potential things off the top of my head. Not all of these would need to be implemented; some will be more useful than others.

// This one would be more complex due to impacts to type resolution
// in generated code. It's also a bit less useful due to the way we use `include!`.
option (prost.package).path = "::my_crate::placement";

message Example {
    option (prost.type).type_name = "ExampleRust";
    option (prost.type).attrs = "#[::aliri_braid::braid]";

    uint64 id = 1 [(prost.field).field_name = "identifier"];
}

message External {
    option (prost.type).external = "::my_external::Type";

    sint32 value = 1;
}

message MessageWithUuid {
    string uuid = 1 [(prost.field).type = "::uuid::Uuid"];
}

message OwnedAndBorrowed {
    option (prost.type).gen_borrowed = true;

    bytes data = 1 [
        (prost.field).type = "::bytes::Bytes",
        (prost.field).borrowed_type = "[u8]"
    ];
    string my_typed_string = 2 [
        (prost.field).type = "::my_crate::MyStronglyTypedString",
        (prost.field).borrowed_type = "::my_crate::MyStronglyTypedStringRef"
    ];
}

A borrowed mode would make it possible to reduce unnecessary copies, as you could deserialize an owned buffer into values referring to that data or construct a message for serialization without needing to take ownership of all of the parts.

Feb 21 '22 16:02 neoeinstein

I posted up a really quick example for the renaming of a field option I mentioned above. Should probably move this discussion to a separate issue or such, but it all builds on top of this PR right now. Renaming of types and packages likely requires a two-pass process so that we can collect all of the aliases first and then use the correct aliases and internally-defined externs when doing code generation pass.

https://github.com/neoeinstein/prost/compare/extensions-usability...neoeinstein:prost-ext

Feb 21 '22 20:02 neoeinstein

Hi! Is this change abandoned now? It is somewhat similar to my feature request here: https://github.com/tokio-rs/prost/issues/658

I'd be particularly eager to see a way to inject support for PGV, preferably directly into validator tags.

May 30 '22 13:05 vriesk

Hi! Is this change abandoned now? It is somewhat similar to my feature request here: https://github.com/tokio-rs/prost/issues/658

I'd be particularly eager to see a way to inject support for PGV, preferably directly into validator tags.

This change is not abandoned, and indeed is planned on the way to Prost 1.0. However, we may not land this in the exact form provided in the PR for general public use. There is an underlying need to support unrecognized tags that this should be built upon, and that would eliminate the need to do special work up-front with a registry in order to get to these extensions.

Nonetheless, there are several, including myself, who are using this PR branch as a git dependency and a bridge for doing supplemental codegen. I have some plans to create a generator similar to PGV in my line of protoc-gen-prost crates.

The key thing is that, we want to have an API where you can just use decode rather than decode_with_registry.

It may be that we could land this for an intermediate release and then have a further breaking change that removes the need for a registry, but if we start publishing a release that includes support for it, we'd need to make sure the functionality isn't broken even across breaking API changes.

May 30 '22 14:05 neoeinstein

Thanks for the update. Yeah, I was even thinking about forking prost-build privately for now, however the problem I see is that it is non-trivial to integrate it with tonic for the full service generation (but I have not tried hard enough TBF). Or is there?

May 30 '22 15:05 vriesk

There is an underlying need to support unrecognized tags that this should be built upon, and that would eliminate the need to do special work up-front with a registry in order to get to these extensions. (snip) The key thing is that, we want to have an API where you can just use decode rather than decode_with_registry.

I'm not sure that is possible without a registry or some form of additional information up front.

Extensions are fundamentally data that cannot be parsed by the default parser, so the user has to provide additional information i.e. the registry which tells it how to parse that data. Similarly, unregonized tags / uninterpreted options are just serialized data the parser was not provided a schema for handling.

In either case you still need a way to parse it that's not available by default. If you don't provide a registry or schema up front, you get a list of "sorry couldn't parse this" data that you have to manually parse yourself.

In the case of extensions, the registry step could be eliminated by having a static, global registry that automatically contains all extensions in generated code at compile time, that the default parser would know to use. This is how google's c++ does it. That could be additive on top of this PR and even feature gated for anyone who wants to specify registry manually. Other implementations like java and c# use the extension registry pattern as well.

Unrecognized tags are a different beast because it means you have a different or out of date schema from what the data was serialized with. I think it's a subtle but important difference from extensions.

For that reason I think support for unrecognized tags is a separate code path from extensions, but fwiw I have not thought deeply on that.

May 30 '22 16:05 nswarm

@nswarm I would be interested in landing some of your work but because I am not the original author of this crate we are gonna need go through a proper proposal process w/ feedback etc in a github issue. Then I think we can start to land smaller PRs that are easier to review etc. Sorry for the major delay on reviewing I've had a huge backlog to get through. Thanks for your patience.

Jun 20 '22 20:06 LucioFranco

@nswarm I would be interested in landing some of your work but because I am not the original author of this crate we are gonna need go through a proper proposal process w/ feedback etc in a github issue. Then I think we can start to land smaller PRs that are easier to review etc. Sorry for the major delay on reviewing I've had a huge backlog to get through. Thanks for your patience.

That's fine. Let me know how you'd like me to proceed.

I'll need to spend some time figuring out how to break it up as well.

Jun 27 '22 17:06 nswarm

@nswarm lets start in an issue and you can tag me in it so I get an email and we can go from there. My gh notifications is much better now so I should have some time to talk through design etc. Thanks!

Jun 27 '22 18:06 LucioFranco

Thanks for the comments @andrewhickman! Note that I probably won't do much work on this for a bit, per status in: https://github.com/tokio-rs/prost/issues/674

Aug 01 '22 15:08 nswarm

@nswarm just wanted to let you know I use this PR in a private project with a few code additions. We're replacing a legacy app using old proto2 definitions and it would not be possible to do a drop-in replacement in any reasonable way without this PR.

Sep 28 '22 16:09 cgorski

I know this PR is kind of in limbo right now, and while I'm not volunteering to push it through to the end, I did go and make a fork of it and merge master into it. From reading the comments, it seems like a number of people are using this PR branch, so having an updated one might be helpful

https://github.com/Houndie/prost/tree/extensions

(I also fixed a small bug where Result wasn't properly scoped in macro generation)

Mar 02 '23 18:03 Houndie

In the same vein, I rebased @Houndie's branch on current master at https://github.com/ozars/prost/tree/extensions, fixing a small issue due to failed doctests in readme file.

Sep 03 '23 07:09 ozars

I added a few commits to @ozars' fork above & merged in master to pick up 0.12.1

https://github.com/jhugard/prost/tree/extensions

Added extension_registry to ServiceGenerator Config & builder, so that that extensions can be used in the service_generator.
Commented-out generation of the file-level register_extensions helper, because it creates multiple conflicting instances when extensions are defined in multiple proto files using the same namespace. The message-level extensions must be registered individually instead.

Oct 02 '23 20:10 jhugard

As an aside, I would be more in favor of a reflection-based approach applied at runtime. Feels very clunky to have to bootstrap extensions by copying a generated file into my project, so that the extension definitions are available for use with prost_build in build.rs.

Oct 02 '23 20:10 jhugard