rust-protobuf icon indicating copy to clipboard operation
rust-protobuf copied to clipboard

Enforce setting `message` fields when instantiating `struct`s

Open Ten0 opened this issue 6 years ago • 8 comments
trafficstars

Hello!

We are considering using gRPC in our monorepo with microservices. This means at the .protos are subject to change, but it's not an issue as long as these changes are properly taken into account in every project that uses that .proto.

My issue is :

  • The generated code creates structs that come with a ::new() constructor that sets everything to their std::default::Default values.
  • This implies that the compiler does not enforce that all the fields are set when creating the structure, like it does by default.
  • Specifically, if I update my .proto by adding a new field, this will not cause compile errors anywhere, and just leave the field with its std::default::Default value on places where I forget to update it, and probably cause a bug.

I'm trying to avoid this problem. I tried disabling the accessors generation by setting the corresponding options when generating the code in build.rs :

    protoc_grpcio::compile_grpc_protos(
        &targets,
        &[proto_root],
        &proto_root,
		Some(protobuf_codegen::Customize {
			expose_fields: Some(true),
			generate_accessors: Some(false),
			..std::default::Default::default()
		})
    )

but this had no effect whatsoever.

The documentation also doesn't say that it removes the ::new() method, so I should assume that it wouldn't even if it worked.

Also, even if it had, it would be very bothering and unclean to have to set these fields everytime we instantiate a struct :

    pub unknown_fields: ::protobuf::UnknownFields,
    pub cached_size: ::protobuf::CachedSize,

What is the right way to solve that problem ?

Ten0 avatar Mar 21 '19 16:03 Ten0

I think this is by design of protocol buffers: all fields are optional.

I have two suggestions on how to deal with this issue:

  • never invoke new directly, create a function for each struct and invoke that function
  • use rust-protobuf reflection to validate in a runtime that all fields are set

If you have better ideas, let me know.

stepancheg avatar Mar 24 '19 17:03 stepancheg

TL;DR - Jump to As a conclusion below.

The fact that all the fields are optional in protocol buffers is indeed by design of the protocol buffers : it enables compatibility between different versions of the same protocol-buffers-connected softwares. They are also designed in such a way that every field is encoded in the same way on its default value as when it is not present, and I would think that is for performance reasons.

If however we are using the increasingly popular mono-repositories which guarantee that all services use the same version of the .proto code, and if the code also enforces setting all the fields, then it creates by itself end-to-end compile-time checks that guarantee that every field is properly set on purpose, eliminating both of these disadvantages of protocol buffers : you don't have to do sanity checks all over the place to make sure that a field is set, and you can safely use a bool set to false, a number set to 0, etc, without the fear of it being a forgotten field.
And of course, compile-time checks are better than runtime checks because they guarantee that deployed code works just from the fact that it compiles.

The Rust team has designed the language in such a way that there is no null, and that when you instantiate a struct you have to set all of its fields, and there are no stub values anywhere. I beleive that it's an amazing decision for writing code that always works, and the fact Rust has this stability mindset is one of the main reasons why we are using it.

There are certainely quite a few use-cases with traditional multi-repo projects that use different versions of the .proto files where you want that behaviour of having lots of optional fields which you set one by one. However, if we are chosing Rust (and I would think that is the case of many other people), it is not for having tons of default-valued-possibly-not-set-i-dont-know fields everywhere in our code (like go or any other language that supports null), it is for enforcing the same sane programming standards as Rust does enforce.

For these reasons, we would indeed very likely have as company policies to :

  • Never use ::new() on the generated structures
  • Never use the accessors for writing fields (we would probably disable them altogether if the option worked)

However, this implies :

  • Telling people to not use ::new() nor the accessors and making sure they do, whereas if they were simply not generated we wouldn't have to.
  • Having the following kind of code :
use protos::env_service::{EnvReply, EnvReply_Result, EnvRequest};
use protos::env_service_grpc;
...
		let (result, content) = match self.try_get_env(req.get_identifier()) {
			None => (EnvReply_Result::VAR_NOT_FOUND, String::new()),
			Some(env_var_value) => (EnvReply_Result::SUCCESS, env_var_value),
		};
		let resp = EnvReply {
			result,
			content,
			unknown_fields: Default::default(),
			cached_size: Default::default(),
		};

Which has these disadvantages :

  • We lose the Rust-like instantiation of structures where you have to set the fields using their names by being forced to use tuples, which is a shame because having that was I beleive another great decision of the Rust team for readability and not swapping fields by mistake.
  • We have to write the following boilerplate everywhere we instantiate a struct : unknown_fields: Default::default(), cached_size: Default::default(), and even more so if we chose to always use the structs to avoid the first disadvantage.

As a conclusion, I beleive that for a use case like ours, and probably that of many other people in the years to come, we need a code generation option that :

  • Does not generate ::new() nor the accessor fields.
  • Either of :
    • Does not generate the unknown_fields and cached_size fields
    • Creates another struct which does not need them and implements Froms/Intos for converting between the two (though I would have a preference for the first one)

-- PS: I still don't understand why these options have no effect :

expose_fields: Some(true),
generate_accessors: Some(false),

Is this a bug ?

Ten0 avatar Mar 24 '19 22:03 Ten0

PS: I still don't understand why these options have no effect : expose_fields: Some(true), generate_accessors: Some(false), Is this a bug ?

Hard to say, too many unknowns (what's syntax, what's field type). Can you provide a full example where it doesn't work?

stepancheg avatar Mar 24 '19 23:03 stepancheg

unknown fields

rust-protobuf can have an option to skip generation of unknown_fields field.

This way generated code will lose the ability to preserve unknown fields during decoding/encoding, but if you are 100% sure you don't need it, it's possible to have it.

cached_size

In the old days of rust-protobuf, messages had no cached_size field. Cached sizes were stored on the stack.

That we less efficient than storing cached size in cached_size field, so I introduced cached_size.

Old behavior can be revived as option, but only in rust-protobuf version 3, because it would be backward-incompatible change.

stepancheg avatar Mar 24 '19 23:03 stepancheg

rust-protobuf can have an option to skip generation of unknown_fields field.

Great! What are the next steps ? :) (Note that we are somewhat busy at the moment and may not be able to implement this right away. Do you have a contribution guide though ?)

That we less efficient than storing cached size in cached_size field, so I introduced cached_size.

I see. Would it be possible to store it somewhere else ? compute_size() still has to be called after any user manipulation since it depends on the strings size. Would packing the message struct in a tuple or a MessagePlusCache{message, cached_size} work ? If I'm not mistaken, memory-wise it would behave in the same way as now.

And otherwise how about the second option ?

Creates another struct which does not [have the unknown_fields and cached_size fields] and implements Froms/Intos for converting between the two

With this second option we should also think about a nice naming because we probably don't want to be using EnvRequest_someBoilerPlateThatTellsThatItOnlyContainsMessageDefinitionFields because said boilerplate would make it non-nice to use. Overall that doesn't seem as clean.

PS: I still don't understand why these options have no effect : expose_fields: Some(true), generate_accessors: Some(false), Is this a bug ?

Hard to say, too many unknowns (what's syntax, what's field type). Can you provide a full example where it doesn't work?

Here is the .proto that matches the previous code :

syntax = "proto3";

package env_service;

// The greeting service definition.
service Env {
  // Get an environment variable
  rpc GetEnv(EnvRequest) returns (EnvReply) {}
}

// The request message containing the environment variable's name
message EnvRequest { string identifier = 1; }

// The response message containing whether the command was successful, and a possible content
message EnvReply {
  enum Result {
    FAILURE_OTHER = 0;
    SUCCESS = 1;
    VAR_NOT_FOUND = 2;
  }
  Result result = 1;
  string content = 2;
}

This generates the following Rust code no matter what Some values are given to expose_fields and generate_accessors :

#[derive(PartialEq,Clone,Default)]
pub struct EnvReply {
    // message fields
    pub result: EnvReply_Result,
    pub content: ::std::string::String,
    // special fields
    pub unknown_fields: ::protobuf::UnknownFields,
    pub cached_size: ::protobuf::CachedSize,
}

impl EnvReply {
    pub fn new() -> EnvReply {
        ::std::default::Default::default()
    }

    // .env_service.EnvReply.Result result = 1;

    pub fn clear_result(&mut self) {
        self.result = EnvReply_Result::FAILURE_OTHER;
    }

    // Param is passed by value, moved
    pub fn set_result(&mut self, v: EnvReply_Result) {
        self.result = v;
    }

    pub fn get_result(&self) -> EnvReply_Result {
        self.result
    }

Any idea on what might be going wrong ? (build.rs code is in my first message on the issue.)

Ten0 avatar Mar 25 '19 00:03 Ten0

Any idea on what might be going wrong?

Thank you for the example.

It turns out it was never implemented in a stable branch. Looking how hard would it be to port it from master...

stepancheg avatar Mar 26 '19 01:03 stepancheg

Merged customize_accessors implementation into 2.5 branch. Not released yet.

stepancheg avatar Mar 26 '19 01:03 stepancheg

I see. Would it be possible to store it somewhere else ? compute_size() still has to be called after any user manipulation since it depends on the strings size. Would packing the message struct in a tuple or a MessagePlusCache{message, cached_size} work ? If I'm not mistaken, memory-wise it would behave in the same way as now.

FYI: this is the diff which replaced external storage of cached sizes to cached_sizes field: here. Hope it helps.

And otherwise how about the second option ?

Project gets frequent requests to perform code generation customization.

So I think the best way to deal with it is to provide callbacks to customize generated code (because I think it is impossible to design a set of callbacks enough for everyone and maintable).

Basically, Customize struct should have a field of type like this:

trait Customizer {
    fn customize(&self, code: GeneratedCode) -> GeneratedCode;
}

struct Customize {
    ...
    customizer: Box<Customizer>,
}

This is the task: https://github.com/stepancheg/rust-protobuf/issues/338

It need to be designed carefully though to make it backwards compatible.

Second option can be implemented using this callback.

stepancheg avatar Mar 26 '19 02:03 stepancheg