prost icon indicating copy to clipboard operation
prost copied to clipboard

prost_build: type_attribute path incorrectly matches submessages (no way to specify exact match)

Open bsurmanski opened this issue 3 years ago • 6 comments

If there is a nested message, adding a type attribute to an outter message will add the type attribute to the inner message as well. Example, given the following message:

syntax = "proto3";
package mydemo;

message Outter {
	enum Inner {
		A = 0;
		B = 1;
	}
	repeated Inner inners = 1;
	float field = 2;
}

and the following build.rs:

use std::io::Result;

fn main() -> Result<()> {
    let mut config = prost_build::Config::new();
    config.type_attribute(".mydemo.Outter", "#[serde(default)]"); 
    config.compile_protos(&["proto/demo.proto"], &["proto/"])
}

Expected: I would expect only mydemo.Outter to have a #[serde(default)] annotation.

Actual: This will also apply an annotation to mydemo.Outter.Inner. Here is the generated rust:

#[serde(default)]
#[derive(Clone, PartialEq, ::prost::Message)]
pub struct Outter {
    #[prost(enumeration="outter::Inner", repeated, tag="1")]
    pub inners: ::prost::alloc::vec::Vec<i32>,
    #[prost(float, tag="2")]
    pub field: f32,
}
/// Nested message and enum types in `Outter`.
pub mod outter {
    #[serde(default)]
    #[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord, ::prost::Enumeration)]
    #[repr(i32)]
    pub enum Inner {
        A = 0,
        B = 1,
    }
}

It can be seen that Inner incorrectly has a #[serde(default)] annotation. This is an issue in my case, because a #[serde(default)] cannot apply to an enum (and without the annotation on Inner, Outter can be default since inners is a Vec).

Work around tried:

  • add a specific empty type annotation for Inner: config.type_attribute(".mydemo.Outter.Inner", ""); this did not work, regardless of type_attribute ordering.

Potential solutions:

  • add an additional "end of match" delimiter (example: "$"), so you can specify something like .mydemo.Outter$
  • use a more standard matching engine such as glob. (or regex, though that tends to be less user-friendly)
  • add a visitor pattern, where the user provides a closure that that allows introspection on each type/field before producing an annotation.
  • add first class serde support that is aware of typing issues, so you can say "enable serialization for all valid types under path"

IMO the visitor solution is probably the most robust and powerful. For example, the visitor closure can be provided a prost_types::DescriptorProto or FieldDescriptorProto respectively, and produce a string. My secondary preference would be glob. The current matching function isn't particularly well documented, and having a standard matching function would make it easier to grok.

I'm using prost_build version 0.9. Please let me know if there is a workaround here.

bsurmanski avatar Jan 27 '22 15:01 bsurmanski

I am also running into https://github.com/tokio-rs/prost/issues/504, which seems to be an unrelated but similar issue with path matching.

bsurmanski avatar Jan 27 '22 17:01 bsurmanski

Looks like the code generator (https://github.com/tokio-rs/prost/blob/d45a37cab69c2a0cda36a97df8823fcaf43dc1fe/prost-build/src/code_generator.rs#L185) has access to both the DescriptorProto or FieldDescriptorProto as well as the Config at the appropriate place. So adding a visitor likely wouldn't require much (or any) refactoring.

I think you would have to do the following:

  • add a member type_visitors: PathMap<&'a mut FnMut(&DescriptorProto) -> Option<String>> and field_visitors: PathMap<&'a mut FnMut(&FieldDescriptorProto ) -> Option<String>> to Config
  • and add a lifetime parameter to Config to support the new members
  • in CodeGenerator::append_type_attributes and CodeGenerator::append_field_attributes, add another loop to call each matching respective visitor and append any Some(String) returned.
  • Add a new method to Config, such as pub fn type_visitor<'a, P>(&mut self, path: P, visitor: &'a mut FnMut(FieldDescriptorProto ) -> Option<String>) -> &mut Self and similar for field_visitor.
  • Add tests. I think PathMap already covers the difficult parts, maybe just test multiple overlapping-path visitors, and visitors that return None.

bsurmanski avatar Jan 28 '22 15:01 bsurmanski

+1 for the visitor based approach, I think this ultimately gives the most flexibility for the consumer, and it is consistent with the existing ServiceGenerator visitor API.

andrewhickman avatar Mar 13 '22 01:03 andrewhickman

Hmm, I was making a quick mock up of the proposed solution and ran into a couple of ambiguities:

  1. enums use a EnumDescriptorProto, but call append_type_attributes
  2. Oneofs use a OneOfDescriptorProto but call append_type_attributes
  3. enums use a EnumValueDescriptorProto but call append_field_attributes
  4. Oneofs use a list of FieldDescriptorProtos

Seems like oneofs and enums are a little weird, but field and type attributes still apply to them. I guess the visitors can be provided a rust enum of the conflicting descriptor protos? But that seems a little ugly...

bsurmanski avatar Mar 14 '22 02:03 bsurmanski

It seems like the ServiceGenerator uses a AST Service struct, even though a ServiceDescriptorProto exists.

It seems the AST struct also provides the proto name, package, and comments but is otherwise the same info. It looks like the AST struct is [created from the ServiceDescriptorProto](https://github.com/tokio-rs/prost/blob/922a9a163f79e06c2b7a89c0bf417aa50e9081b8/prost-build/src/code_generator.rs#L703) and passed into any ServiceGenerator, and that's it's sole purpose.

Likely a similar structure should be mimic'd. With a AST Type and AST Field struct being created and passed to whatever visitor.

But, for ServiceGenerator, they use a trait pattern, passed to the Config. I suppose this makes sense because a Config is supposed to only have one ServiceGenerator. IMO, it would make sense to provide a closure instead of a trait object as a type/field visitor, as I imagine it would make sense to have multiple and using a closure would be much less verbose.

I suppose, like the ServiceGenerator, a buf: &String could be provided as an output parameter (to prevent allocation). Maybe adding a visitor to a config would look like:

     config.type_visitor("path.to.package", |ty, buf| {
         if(ty.name.endswith("GRPC") {
             buf.push_str("#[serde(rename_all = \"camelCase\")]");
         }
    });

(much lighter than a trait IMO)

bsurmanski avatar Mar 14 '22 21:03 bsurmanski

In hopes that future searches will find this bug; it applies to message_attribute() as well; a mutual dependency/nested message will be matched by message_attribute() and have the supplied attribute written to its definition.

Additionally, this issue conspires with the fact that successive calls to compile() will overwrite files emitted by prior calls. Meaning:

  1. the order in which a single threaded build.rs calls compile() may cause a mutual dependency to be compiled an inscrutable number of times, with each successive call overwriting the prior's attributes making refactors very surprising.

  2. Build scripts that concurrently compile separate files with mutual dependencies are in a data race.

Although fixing this will be a breaking change, it may be more discoverable for users if the current compile() method was deprecated with an explanatory message and a compile_v2() exposed in a minor semver bump.

Jeremiah-Griffin avatar Feb 19 '24 07:02 Jeremiah-Griffin