reflect/protodesc: automatically create dynamic extension types for NewFiles
This is likely not possible or not desirable, but:
protojson.UnmarshalOptions uses the given Resolver to potentially resolve encoded extensions https://github.com/protocolbuffers/protobuf-go/blob/master/encoding/protojson/decode.go#L165
However, protojson.MarshalOptions only uses the given Resolver to resolve google.protobuf.Any types.
I don't think the JSON spec says you should use extensions, and for example Java does not: https://github.com/protocolbuffers/protobuf/blob/master/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L92
However, being able to marshal extensions would be incredibly helpful. The only place this matters in proto3 is with custom options, which are currently dropped in JSON marshaling, even with a Resolver attached that includes all messages/extensions for the file the message is contained within and all dependencies.
Both jsonpb and protojson support extensions. For example:
https://github.com/protocolbuffers/protobuf-go/blob/8525b20428fdd941303fbebd0e1b1e43dd5f286a/encoding/protojson/encode_test.go#L912-L942
The reason this doesn't appear to work is likely identical to what's going on with #1120
Want to check this out more, I likely am missing something here but figured you have more context so will ask heh:
- This is only handled in Golang, right? The other languages don't appear to/this doesn't seem to be part of the JSON mapping - above link to
JsonFormat.java, and this comment https://github.com/protocolbuffers/protobuf-go/blob/master/encoding/protojson/encode.go#L383. - If so, is it fair to say that if an extension isn't registered globally, and you don't set a resolver on
json.MarshalOptions, that any extensions will effectively be ignored? - Documentation says it only handles
google.protobuf.Any, but just an FYI https://godoc.org/google.golang.org/protobuf/encoding/protojson#MarshalOptions
The use case I have may be specific. I'm doing the following:
- Read or build a self-contained (ie all imports present)
FileDescriptorSetdynamically that contains custom options, and values for those options. - Create a
*protoregistry.Typesper the logic in https://github.com/golang/protobuf/issues/1065 - pinned-to-commit (so won't be dead :-) ) link with bad code that does this: https://github.com/bufbuild/buf/blob/0767b6189c52b377d7f4657130cc75e08cd53f2a/internal/pkg/protoencoding/resolver.go - Create a
protojson.MarshalOptionswith this*protoregistry.Typesas theResolver. - Marshal the original
FileDescriptorSetusing thisprotojson.MarshalOptions.
This doesn't result in any custom options being printed.
I can produce a test that reproduces this if you want, but tldr:
syntax = "proto3";
import "google/protobuf/descriptor.proto";
extend google.protobuf.FieldOptions {
string foo = 50000;
}
message Bar {
string baz = 1 [(foo) = "bat"];
}
Results in:
{
"file": [
{
"name": "foo.proto",
"dependency": [
"google/protobuf/descriptor.proto"
],
"messageType": [
{
"name": "Bar",
"field": [
{
"name": "baz",
"number": 1,
"label": "LABEL_OPTIONAL",
"type": "TYPE_STRING",
"jsonName": "baz",
"options": {}
}
]
}
],
"extension": [
{
"name": "foo",
"number": 50000,
"label": "LABEL_OPTIONAL",
"type": "TYPE_STRING",
"extendee": ".google.protobuf.FieldOptions",
"jsonName": "foo"
}
],
"syntax": "proto3"
}
]
}
Note the empty options: {}
This is only handled in Golang, right?
Correct, extension handling is one of the "features" unique to Go that the other languages don't provide. However, the old implementation provided it, so we forward ported the feature to new implementation even though it goes against our stated policy of avoiding features that other language implementations don't provide.
If so, is it fair to say that if an extension isn't registered globally, and you don't set a resolver on json.MarshalOptions, that any extensions will effectively be ignored?
No, currently the resolver is only used for google.protobuf.Any, which are stored within a message in an unstructured form. Thus, you absolutely need a resolver to make sense of the information contained within it. However, extensions are stored within a message as a field value with structured information (i.e., the protobuf type information is present). Therefore, a resolver is not necessary.
Given that protojson doesn't automatically parse unknown fields as structured values using the resolver, the issue lies elsewhere (probably the proto.Unmarshal call used to obtain the data in the first place). The extension resolver should be given to the proto.Unmarshal call, rather than the protojson.Unmarshal call.
Documentation says it only handles google.protobuf.Any, but just an FYI
And the documentation is correct, the resolver is currently only used for Any. The interface provides functionality to resolve extensions since they're need to resolve extensions within the message contained by an Any message.
OK thanks for the info.
There's a bit of a chicken or egg problem here though right? In the dynamic case, where you're taking a FileDescriptorSet from an external source and unmarshalling it, you obviously can't create an extension resolver to do proto.Unmarshal for this FileDescriptorSet as you'd need the FileDescriptorSet to create the extension resolver using *protoregistry.Types in the first place per the https://github.com/golang/protobuf/issues/1065 logic, unless I am missing something (entirely possible).
@jhump also FYI, there may be some other issue here with protoparse, as obviously there's two tests I do:
- Unmarshal from a
FileDescriptorSetcreated withprotoc(but this has the above chicken-or-egg problem withproto.Unmarshal). - Create a
FileDescriptorSetusing protoparse
There's a bit of a chicken or egg problem here though right? In the dynamic case, where you're taking a FileDescriptorSet from an external source and unmarshalling it, you obviously can't create an extension resolver to do proto.Unmarshal for this FileDescriptorSet as you'd need the FileDescriptorSet to create the extension resolver using *protoregistry.Types in the first place per the #1065 logic, unless I am missing something (entirely possible).
Seems like it; interesting scenario. The best option seems to be the introduction of a proto.ResolveExtensions helper functions that allows you to explicitly signal that you want all unknown fields to be resolved and parsed as structured fields. As much as possible, I'd like to avoid seeing the lazy extension handling be moved back to functions like proto.Equal or protojson.Marshal as it leads to the proliferation of complicated behavior that is difficult (if not impossible) to get right when done automatically.
Yea makes sense re: lazy handling. This is definitely a scenario I'm interested in seeing handled, let me know if there's anything I can do to help (although assuming you know the material here like the back of your hand :-) ).
Are you using protodesc.NewFile or protodesc.NewFiles any way along the way? This chicken-and-egg problem is solely unique to dealing with a FileDescriptorSet and I'm wondering if the protodesc package should specially unmarshal extensions using dynamically created extension types.
The source I have is a FileDescriptorSet, sometimes off the wire, sometimes created with third party lib (jhump/protoreflect) - for JSON marshalling though,I end up calling protodesc.NewFiles so I can call RangeFiles, create the protoregistry.Types, and add it as a resolver.
I could put together an example of what I’m doing in code if it’d help, ie extracted from the complication of bufbuild/buf (which is only the public half anyways)
Unmarshal from a FileDescriptorSet created with protoc (but this has the above chicken-or-egg problem with proto.Unmarshal).
Java has supported descriptors and dynamic messages and extension registries for ages, but the same thing happens there when unmarshaling a FileDescriptorSet. The solution, sadly, is a double-parse:
- Parse pass 1 unmarshals into file descriptor protos.
- Intermediate step: create a resolver/extension registry of all extensions found in the just-unmarshaled file descriptor protos.
- Parse pass 2, but this time using the extension registry just created. After this pass, all custom options should be correctly parsed/known.
I guess there is a bit of an issue with protoparse in the fact that all extensions really are already known after it has done its parse, so the re-parse should not be necessary. But since it is still using APIv1, it has no way to convert its representation into the generated descriptor protos using a resolver (since the resolver is not an APIv1 concept). If it was using APIv2, this would be trivial. I think this may just have to wait until protoreflect v2 for a real fix. And in the time-being, re-parsing is a work-around.
Although I'm now wondering how this worked in buf with APIv1. I think the same issue would have arisen: these custom options would be unknown and thus could not be correctly marshaled via jsonpb.
Maybe I could create a APIv2 compatibility package -- with its own go.mod file (so the main protoreflect go.mod can still refer to pre-v1.4 runtime, so users aren't forced to upgrade). It could provide some helper functions, like converting from desc to the APIv2 format. It could handle re-parsing, to make sure that all resulting descriptors have all fields/extensions known.
Oh to be clear - it didn’t work :-) solving this/looking into it in the detail of this issue was low on the priority list, as for bufbuild/buf’s current public functionality, custom options aren’t required (the linter and breaking change detector don’t use them), so you only get into an issue if you’re trying to use a JSON FileDescriptorSet for something else.
Ive held back on certain features too because I haven’t had a solution to this - for example, bufbuild/buf hasn’t allowed converting binary FileDescriptorSets to/from JSON, as the round trip wouldn’t work.
I didn’t even consider the double-parse though, although parsing is about 90% of the time that buf takes, so thatd be a major performance hit.
To be clear, when I say "re-parse", I don't mean parsing source (which I suspect is 90% of the time in buf). I mean parsing the binary format. But, the more I think about, the more of a pain this would be to do efficiently. The most efficient mechanism would be to only try to unmarshal the unknown fields bytes. But that means you'd have to crawl the whole object graph to find unknown fields in all nested messages.
Ah yes - confused myself on reading your response - yes that’s actually a trivial amount of time in comparison, but...don’t think that solves the protoreflect issue right? Although as to not bombard @dsnet’s inbox, we should probably move this discussion over to protoreflect heh.
Going to re-purpose this issue as potentially having reflect/protodesc auto-create dynamic extensions. The original issue of having encoding/protojson emit extensions already exists.
I think it's probably fair to close this issue now that #1216 has been resolved. While it's not automatic to create extension types as part of NewFiles, it is now a single follow-up call to dynamicpb.NewTypes.