protoreflect icon indicating copy to clipboard operation
protoreflect copied to clipboard

protoparse: ParseFiles and protoc output different RawFields for some custom options

Open WillAbides opened this issue 4 years ago • 4 comments

The input below causes ParseFiles and protoc to generate FileDescriptorProtos with different RawFields for the custom option.

This issue seems specific to MyOption containing a map with a nullable value type and to my_map containing at least one element with no value defined.

syntax = "proto3";
package foo;
import "google/protobuf/descriptor.proto";

message MyOption {
  message Foo {}
  map<string, Foo> my_map = 1;
}

extend google.protobuf.FileOptions {
  MyOption my_option = 9999;
}

option (my_option) = {
  my_map: [
    {key: "a"}
  ]
};

The diff:

          		"options": protocmp.Message{
          			"9999": protoreflect.RawFields{
          				0xfa,
          				0xf0,
          				0x04,
        - 				0x07,
        - 				0x0a,
          				0x05,
          				0x0a,
        + 				0x03,
        + 				0x0a,
          				0x01,
          				0x61,
        - 				0x12,
        - 				0x00,
          			},
          			"@type": s"google.protobuf.FileOptions",
          		},

WillAbides avatar Nov 01 '21 16:11 WillAbides

I'm going to assume the - lines are protoparse and the + lines are protoc? In the future, it would helpful to include information like that. Also, a diff of the output of protoc --decode or protoc --decode_raw is much easier-to-read 😛

In this case, it looks like protoparse is serializing an empty message for the value. I think that is still a semantically equivalent value (e.g. a map with a nil/absent value and a map with an empty value). It is unclear what is actually expected. I'll need to play around with how most runtimes actually marshal maps that look that way and then see how this can be handled (possibly special-cased) in protoparse.

jhump avatar Nov 01 '21 16:11 jhump

FWIW, it is not always expected that protoparse provide a byte-for-byte match with protoc in the descriptors. There are several other edge cases where things like serialization order of fields can differ. Some of these are simply differences between the Go protobuf runtime (which protoparse uses) and the C++ protobuf runtime (which protoc uses).

Also, unlike protoc, protoparse does not try to interpret all options into unknown bytes. It instead interprets them into an actual options message instance. So the result of serializing the options message can lead to small differences in the byte output (particularly for options in source that de-structure a message across multiple option statements).

jhump avatar Nov 01 '21 16:11 jhump

Sorry I left out which is which. This is a diff with protoc on the left and protoparse on the right, so - is protoc and + is protoparse.

This is the last of the issues I found from fuzzing over the weekend. Later today I will submit a PR with the fuzz test and tests for these issues. I just need to make it run from make first.

WillAbides avatar Nov 01 '21 16:11 WillAbides

Interesting and super-weird that protoc adds an empty value to the map entry. I had assumed that side would have been protoparse (like an artifact of how the Go protobuf runtime marshals map values). Surprising...

jhump avatar Nov 01 '21 17:11 jhump

@WillAbides, I don't intend to fix this issue in this repo. This repo does not attempt to generate descriptors that will match the output of protoc byte-for-byte. You can now use the new protocompile repo, which does providate that capability via the CanonicalProto method on linker.Result instances.

That repo will soon replace the actual implementation of protoparse in this repo (#354). But this particular capability (matching byte-for-byte in unrecognized fields) won't be exposed here, only via the underlying protocompile APIs.

jhump avatar Oct 29 '22 02:10 jhump

Closing as won't fix.

Instead, use https://github.com/bufbuild/protocompile. When you get back a file descriptor from the protocompile.Compiler, if it will type assert to linker.Result, then you can call that type's CanonicalProto method. The resulting file descriptor proto should serialize to the exact same bytes as protoc.

jhump avatar Jan 11 '23 17:01 jhump