protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

protoc-gen-go: generate MarshalJSON/UnmarshalJSON methods for messages

Open bcmills opened this issue 8 years ago • 27 comments

Per https://github.com/golang/protobuf/issues/183:

There's several other parts of the proto3 JSON spec that make it impossible to use the standard encoding/json package, like WKTs.

I don't want to have generated protos grow any extra dependencies, so depending on the jsonpb package isn't appropriate.

I think json usage is pervasive enough that we should revisit this decision. Message types whose JSON encoding diverges from the standard encoding/json package should provide MarshalJSON and UnmarshalJSON methods. Then users could use the standard package without the need to explicitly interact with jsonpb.

Yes, it's possible that would add a few more dependencies for programs that don't strictly need them. To the extent that that's a problem, it's a problem that should be fixed in the compiler and linker: the marginal increase in build and link times due to the extra dependencies seems like it would be miniscule compared to the complexity of interacting with two separate and incompatible packages for JSON encoding (e.g. https://github.com/golang/protobuf/issues/248).

bcmills avatar Nov 17 '16 04:11 bcmills

Ok so what would the generated MarshalJSON method do. Would it call jsonpb (which calls encoding/json again) or would it be actual generated code that marshals json faster than with reflect and gives packages like ffjson a run for their money?

awalterschulze avatar Nov 17 '16 06:11 awalterschulze

@awalterschulze: This proposal is about the API, not the implementation.

Initially it would probably just call jsonpb. If we find it's worth the extra overhead to generate code (or use some other optimization technique to avoid reflection), it would be easy enough to change the implementation while preserving the same API.

bcmills avatar Nov 17 '16 17:11 bcmills

Ok so encoding/json is going to call the generated MarshalJSON method which is going to call jsonpb which sometimes also calls encoding/json. Some psuedo circular dependency. We hope it doesn't become a loop. Sounds like any easy bug to make.

Maybe we can list the differences between encoding/json en jsonpb? Even if its just for the record.

I know one reason was the fact that non string keys could not be encoded by encoding/json. But the go1.7 release notes now state the following:

In earlier versions of Go, this package only supported encoding and decoding maps using keys with string types. Go 1.7 adds support for maps using keys with integer types: the encoding uses a quoted decimal representation as the JSON key. Go 1.7 also adds support for encoding maps using non-string keys that implement the MarshalText (see encoding.TextMarshaler) method, as well as support for decoding maps using non-string keys that implement the UnmarshalText (see encoding.TextUnmarshaler) method. These methods are ignored for keys with string types in order to preserve the encoding and decoding used in earlier versions of Go.

Which means this problem might not be a problem anymore?

What are the other differences between encoding/json and jsonpb?

awalterschulze avatar Nov 18 '16 08:11 awalterschulze

Maybe we can list the differences between encoding/json en jsonpb? Even if its just for the record.

Differences I'm aware of:

  • Field names. (Easy enough to fix with a proto compiler change, but it's a breaking change for current users of the json package with generated message types.)
  • The default behavior for json is to include default ("empty") values; the default behavior for jsonpb is to omit them.
  • jsonpb encodes enums by name rather than as integers.
  • jsonpb recognizes protobuf Well-Known Types and parses them accordingly.
  • Maybe some differences involving numbers encoded as strings vs. numbers encoded as JS number literals?
  • Maybe handling of NaNs and Infinity?

bcmills avatar Nov 18 '16 18:11 bcmills

Some of these things are solvable:

  • Field names: maybe the new json_name feature can be used to have something consistent here?
  • "empty": simply generate structures with omitempty?
  • enums: We can generate MarshalJSON and UnmarshalJSON methods for enums, like was done in the past?
  • WKTs: We can write MarshalJSON and UnmarshalJSON for those types.
  • numbers vs strings literals: This might be a problem?
  • NaNs, Infinity, I also don't know about this.

So what if we start by moving the jsonpb implementation to be closer to the encoding/json implementation. Maybe we can start by generating omitempty for proto3 and then simply copying the behaviour of encoding/json for this case? Then we could start handling well known types in a less specific way in the jsonpb package and rather use MarshalJSON and UnmarshalJSON implementations. Then maybe we can get to the point where the proto json marshaler is simply the encoding/json Marshaler with a few custom options? Then our generated types don't need to generate MarshalJSON and UnmarshalJSON methods?

awalterschulze avatar Nov 20 '16 12:11 awalterschulze

Then maybe we can get to the point where the proto json marshaler is simply the encoding/json Marshaler with a few custom options?

I agree that that's a nice goal in principle, and I've retitled the proposal accordingly. I'm not sure whether it's feasible.

bcmills avatar Nov 22 '16 19:11 bcmills

Yes this title is something I can get behind.

Maybe we can start by generating omitempty for proto3 and then simply copying the behaviour of encoding/json for this case?

awalterschulze avatar Nov 22 '16 19:11 awalterschulze

Yes please, this would solve so many problems that are common: people deserializing with jsonpb and serializing with json

mwitkow avatar Dec 07 '16 09:12 mwitkow

Ok, so to sum up this discussion Step 1 would be:

  • general MarshalJSON and UnmarshalJSON for message structs that simply delegate to jsonpb
  • remove the json tag on the struct to not confuse people
  • add a flag to the protoc-gen-go compiler that allows you fallback to the previous behaviour (no MarshalJSON UnmarshalJSON generation json tag is there)
  • document it in a the release notes protoc-gen-go

I like this approach, because it gets us 90% there (you can use encoding/json like normal humans), there is a fallback for prior users of json, and the jsonpb deserializes the "whole tree" (without sub message caling) so we de-risk any edge cases.

Step 2 would be to consider code-generating the MarshalJSON/UnmarshalJSON functions that work in a tree-like fashion. I.e. if a message has a nested message, the relevant MarshalJSON/UnmarshalJSON of the nested message is called. This may be tricky with:

  • numbers vs strings literals: This might be a problem?
  • NaNs, Infinity, I also don't know about this. But I'm sure we can have helper functions in jsonpb library as a stepping stone that handles the field values inside the codegenned methods.

mwitkow avatar Feb 27 '17 09:02 mwitkow

Note that MarshalJSON and UnmarshalJSON would have to be opinionated on which options (fields in jsonpb.Marshaler and jsonpb.Unmarshaler) are used. It may be fine to have this as a default, but I don't think we can remove the jsonpb package.

zombiezen avatar May 19 '17 20:05 zombiezen

@zombiezen True, but I think the protobuf JSON spec and the Go encoding/json package provide pretty clear defaults.

The Marshaler options:

  • EnumsAsInts is not allowed by the spec: it requires "[t]he name of the enum value as specified in proto".
  • EmitDefaults is explicitly allowed and must default to false.
  • Indent is allowed (the spec is silent about whitespace), so it should default to the same formatting produced by encoding/json.
  • OrigName is not allowed by the spec: it requires that "names are mapped to lowerCamelCase".

The Unmarshaler options:

  • AllowUnknownFields is not addressed in the spec. We should probably get some clarification on that, and/or check the published conformance-test suite.

bcmills avatar May 19 '17 21:05 bcmills

SGTM. Just wanted to make sure we thought about it.

zombiezen avatar May 19 '17 21:05 zombiezen

I just wanted to add a note: the json StructTag value of "-" isn't checked. I found out about this in a new project, where I added json:"-" for a password field (just to be sure) - it didn't work. So it would seem, there is currently no way to ignore a value with jsonpb.

morphar avatar Aug 23 '17 03:08 morphar

@morphar Curious, how did you set the JSON struct tag value of "-" on a generated type field? If this is for a dynamic message type, you can implement JSONPBMarshaler/JSONPBUnmarshaler. See https://github.com/golang/protobuf/blob/master/jsonpb/jsonpb_test.go#L767.

cybrcodr avatar Aug 24 '17 05:08 cybrcodr

any progress on this? need help?

mkmik avatar Sep 06 '17 16:09 mkmik

In case anyone is interested, I implemented protoc-gen-go-jsonpb to generate MarshalJSON/UnmarshalJSON functions for messages.

tnarg avatar Nov 21 '17 18:11 tnarg

Would love MarshalJSON/UnmarshalJSON methods to be generated for Message types. Would make it a lot easier to deal with map/array types as well, such as the following:

message A {
  string name = 1;
  map<string, B> b_field = 2;
}

message B {
  float64 C = 1;
}

In this case, say we have a backend service that needs to serialise/deserialise only A.b_field:

func (h *Handler) storeBFields(a *mypackage.A) error {
  b := a.GetBField()
  
  jsonpb.Marshal(b)  // <--- Wrong: map[string]*mypackage.B does not implement proto.Message

}

By having MarshalJSON/UnmarshalJSON methods, it would allow users to use only stdlib encoding/json's Marshal/Unmarshal functions to unmarshal protobuf to it's exact JSON mapping, without using hacky workarounds.

marcusljx avatar Mar 22 '19 07:03 marcusljx

Perhaps related, at least as far as decoding enums is concerned: https://github.com/golang/protobuf/issues/866

mvdan avatar Jun 07 '19 17:06 mvdan

Whoops I missed this dup when I reported my issue (#926).

My stab is similar to @tnarg above, I didn't see this thread and googling didn't bring anything up. We've now started using this and so far has been working great: https://github.com/mitchellh/protoc-gen-go-json

It'd be great to have this sort of functionality built-in.

mitchellh avatar Aug 12 '19 18:08 mitchellh

Is this waiting on the refactor mentioned in #266 for v2 of this pkg?

Anything members of the community can help out with to move this particular set of issues through? Very much would like to see this land in the core pkg.

mfridman avatar Jan 08 '20 02:01 mfridman

Huh, this isn’t such a bad idea. Since we can override JSON marshaling/unmarshalling per message to call into jsonpb instead, we can avoid incidents of people confusing which one they are supposed to be using. And meanwhile, standard encoding/json does not have to support any of the unique issues of protobuf JSON encoding…

puellanivis avatar May 08 '20 11:05 puellanivis

Thanks @mitchellh, that package works perfectly! Would love to see this built in as well.

hashamali avatar May 14 '20 06:05 hashamali

Is there a status on this? I recently came across this issue when trying to wrap a protobuf generated struct inside another struct that contained other non protobuf generate structs. Now I can't simply use json.Unmarshal/Marshal because the protobuf related struct requires special handling related to the enum type fields. I'd be super happy to have these protobuf structs implement the MarshalJSON and UnmarshalJSON functions with some sane defaults.

Please let me know if this is ready to be worked on.

jbgrunewald avatar Dec 05 '20 01:12 jbgrunewald

@jbgrunewald , use https://github.com/mitchellh/protoc-gen-go-json

tamalsaha avatar Dec 05 '20 04:12 tamalsaha

Thanks, this looks very promising.

James Grunewald

On Dec 4, 2020, at 8:25 PM, Tamal Saha [email protected] wrote:

 @jbgrunewald , use https://github.com/mitchellh/protoc-gen-go-json

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jbgrunewald avatar Dec 05 '20 04:12 jbgrunewald