prost icon indicating copy to clipboard operation
prost copied to clipboard

Make it easy to write forward-compatible code: struct builders?

Open wchargin opened this issue 4 years ago • 9 comments

In protobuf, adding a field to a message is considered a backward compatible change. It’s compatible at the wire format level and also at the source level for many sets of bindings, including the standard bindings for Python, Go, C++, and Java.

Prost makes it easy to directly initialize structs of message type:

// message Person { string name = 1; string email = 2; }
let person = proto::Person {
    name: "A. U. Thor".into(),
    email: "[email protected]".into(),
};

But then if a new type is added to the message, the Rust code fails to compile, because it’s no longer initializing all the fields.

One way to achieve compatibility is to spread in ..Default::default():

let person = proto::Person {
    name: "A. U. Thor".into(),
    email: "[email protected]".into(),
    ..Default::default() // more fields may be added later
};

This works, but (a) you have to remember to do it every time, and (b) it can get a bit spammy when there are multiple layers of nesting:

            ..Default::default()
        },
        ..Default::default()
    },
    ..Default::default()
}

It would be nice to have a way to guarantee better compatibility. One approach would be some kind of optional builder API:

let person = proto::Person::builder()
    .name("A. U. Thor".into())
    .email("[email protected]".into())
    .build();

I don’t know what exactly the right ergonomics are here, especially around nested fields. (I half-wonder if the builder should take a closure, but I think that that might get a bit hairy.)

A nice side-effect of such a builder pattern would be that the methods could take impl Into<FieldType>, so you could pass &str literals rather than having to write .to_string()/.into() everywhere.

An option to set #[non_exhaustive] on structs doesn’t sound quite right to me, since (a) that doesn’t do anything inside the same crate and (b) then you can’t use functional record updates (..default()).

It’s fine with me if the struct fields remain public; we can have a style recommendation that says “always use builders when building messages of a type that you don’t control”, or something. Though being able to control that via a codegen option would be nice, too.

This is an important consideration for using Prost in shared or large codebases, especially monorepos. It’s common to have a shared protobuf definition with autogenerated bindings that many teams depend on. Enabling the proto owners to make forward-compatible changes without breaking other teams’ builds would be a requirement to use Prost.

wchargin avatar Dec 02 '20 03:12 wchargin

I don't agree that adding a field to a message being a backwards compatible wire change is sufficient motivation to also guarantee that it's a source-compatible change. There are plenty of other guaranteed wire-compatible changes that typically are not source compatible in any (typed) language bindings, e.g. widening an integer, moving a field into a oneof, renaming a field, removing an optional field, etc.

I do think #[non_exhaustive] is the way forward for users who want to expose a prost-generated type in a public crate API that needs stability guarantees. I don't personally think this is a wise thing to do, but a lot of people seem to want to do it 🤷 . For workspace-style private codebase I find that the compiler pointing out where code needs to get updated is extremely helpful.

(b) then you can’t use functional record updates (..default())

This appears to work just fine: https://gist.github.com/a64fe9b1dd55d89ed8687381395f02d4

danburkert avatar Dec 02 '20 19:12 danburkert

This appears to work just fine

Maybe that's an artifact of the type being in the same crate? I haven't used #[non_exaustive] in anger since it's a relatively new feature, so I'm not exactly sure what its limitations are.

danburkert avatar Dec 02 '20 19:12 danburkert

Hi! Thanks for your response.

I don't agree that adding a field to a message being a backwards compatible wire change is sufficient motivation to also guarantee that it's a source-compatible change. There are plenty of other guaranteed wire-compatible changes that typically are not source compatible in any (typed) language bindings, e.g. widening an integer, moving a field into a oneof, renaming a field, removing an optional field, etc.

Agreed: not all wire-compatible changes are source-compatible. But adding a field specifically is meant to be both wire-compatible and source-compatible. The guidelines (also linked above) say that fields “may be added to existing APIs in the same version”, and also say a single version “must” be source- as well as wire- and semantic-compatible, which implies the conclusion.

Correspondingly, even though Go, Java, and C++ are all typed, their language bindings treat this as a source-compatible change. Java and C++ only expose getters and setters (and/or builders). Go has structs like Prost does, but omitted fields in Go struct literals are implicitly set to zero values.

For workspace-style private codebase I find that the compiler pointing out where code needs to get updated is extremely helpful.

Absolutely; I’m a huge fan of this workflow as well (“change one type, fix all the errors, it works first try”). To quote from Abseil’s “Use Exhaustive switch Statements Responsibly” (h/t @nfelt), for this to work, you need one of three things:

  1. The owner of the enum guarantees no new enumerators will be added,
  2. The owner of the enum is willing and able to fix our code when new enumerators are added (e.g. the enum definition is part of the same project),
  3. The owner of the enum will not be blocked by breaking our build (e.g. their code is in a separate source control repository), and we’re willing to be forced to update our switch statements when updating to the latest version of the enum-owner’s code.

The “change one type, fix all the errors, it works first try” flow is basically picking option (2), but for published protos this doesn’t work. For protos that aren’t set in stone, this leaves (3), which may be impossible (monorepo) or undesirable (if you want to take the proto semantics).

This appears to work just fine

Maybe that's an artifact of the type being in the same crate?

Yeah, that’s right:

$ cat sub.rs
#[non_exhaustive]
#[derive(Debug, Default)]
pub struct S {
    pub a: i32,
    pub b: i32,
}
$ cat main.rs
extern crate sub;
fn main() {
    let s = sub::S {
        a: 1,
        b: 2,
        ..Default::default()
    };
    dbg!(s);
}
$ rustc sub.rs --crate-type=lib
$ rustc main.rs -L.
error[E0639]: cannot create non-exhaustive struct using struct expression
 --> main.rs:3:13
  |
3 |       let s = sub::S {
  |  _____________^
4 | |         a: 1,
5 | |         b: 2,
6 | |         ..Default::default()
7 | |     };
  | |_____^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0639`.

By the way, I played with sketching out an API for this, and was pleased to see that if you put the builder-related methods in traits, then the whole thing works fine even if the proto happens to have a field called build or something else pathological. (Maybe obvious to you, but it’s not too often that I see the tangible benefits of trait methods being namespaced as they are.)

wchargin avatar Dec 02 '20 22:12 wchargin

The guidelines (also linked above) say that fields “may be added to existing APIs in the same version”, and also says a single version “must” be source- as well as wire- and semantic-compatible.

That link appears to be to Google Cloud Client guidelines, that's not really relevant to the design of a general purpose protobuf library. I haven't seen any protobuf docs or guidelines that say anything about source compatibility of generated code in any context, and in practice there is no such guarantee as I pointed out.

WRT #[non_exhaustive] that's pretty unfortunate, it makes it a little less useful of a tool. It's still possible to use #[non_exhaustive] without a builder API though, doing something like:

let mut msg = MyMessage::default();
msg.foo = "bar".to_owned();
...

danburkert avatar Dec 02 '20 22:12 danburkert

Also note that you could in theory attach a general purpose builder derive macro to prost generated types, e.g. https://crates.io/crates/derive_builder. I haven't tried this so maybe there's some sticking points, but I believe all the hooks are available in prost-build to make it possible.

danburkert avatar Dec 02 '20 22:12 danburkert

Combination of #[non_exhaustive] and builder sounds like a best option to me since then it's possible to match on the struct and access the fields directly without creating compatibility hazard.

I kinda like the idea of general-purpose builder although I'm not sure if there's a reason why prost-specific could be better. One thing that comes to my mind is avoiding naming conflicts (if someone funny names protobuf messages Foo and FooBuilder).

My use case is a library for quite large gRPC protocol that I don't have the time to write cleaner bindings for. (I wish I had.)

Kixunil avatar Jul 06 '21 18:07 Kixunil

https://lib.rs/crates/proto-builder-trait

This seems to do the job? Maybe it could be upstreamed?

damelLP avatar Dec 14 '22 06:12 damelLP

derive_builder does away with a lot of the boilerplate and is very flexible, but I think a code generator using specific knowledge of the protobuf type system would be able to provide a better API out of the box.

Consider repeated fields:

message HelloRequest {
    repeated string flags = 2;
}

The builder for HelloRequest could provide this setter method:

impl HelloRequestBuilder {
    pub fn flags<T>(mut self, flags: impl IntoIterator<Item = T>) -> Self
    where T: Into<String>,
    {
        self.inner.flags = flags.into_iter().map(Into::into).collect();
        self
    }
}

Stylistically, I don't like the builder type becoming a part of the API that you have to use by name. It's better to add an associated fn to the message type, and have the builder type tucked away in a submodule:

impl HelloRequest {
    pub fn builder() -> hello_request::HelloRequestBuilder {
        todo!()
    }
}

mzabaluev avatar Apr 13 '23 23:04 mzabaluev

As realized in #901, a protobuf-specific generator can provide much more ergonomic API than a general purpose derive.

mzabaluev avatar Sep 14 '23 09:09 mzabaluev