protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Allow protojson to decode legacy maps into messages with shortcut map notation

Open advdv opened this issue 2 years ago • 4 comments

Is your feature request related to a problem? Please describe. We're dealing with a service architecture that describes maps in different ways in different service (for legacy reasons). Some old services emit messages with the "legacy" structure for encoding maps inside of protobuf messages. Other services use modern protobuf definitions with the "shortcut" notation for maps (see: https://developers.google.com/protocol-buffers/docs/proto3#maps).

As mentioned in the documentation the map<T,T> notation is sugar for what is in effect a list of entry messages as it is encoded on the wire. This is good, and works for us so we don't have to care about which service uses what notation. But this breaks when sent it in the canonical json format.

To explain this further, let's consider the following proto file (in reality, for our use case this would be in two different files but the explanation stays the same)

syntax = "proto3";
package examples.v1;

// ShortcutMap message use the map shortcut for our map field
message ShortcutMap {
    map<string, string> map_field = 1;
}

// LegacyMap uses the legacy setup to describe a map
message LegacyMap {
    // We repeat entries to make a map
    message MapFieldEntry {
        string key = 1;
        string value = 2;
    }
    repeated MapFieldEntry map_field = 1;
}

Now let's consider the scenarios that one service is sending the legacy LegacyMap message to a service that is only aware of the new ShortcutMap definition. Both in JSON and in the proto wire format:

package main

import (
	"fmt"

	examplesv1 "github.com/crewlinker/some-internal-protobuf-go-dir/v1"
	"google.golang.org/protobuf/encoding/protojson"
	"google.golang.org/protobuf/proto"
)

func main() {

	// send messages from our legacy code
	send1 := &examplesv1.LegacyMap{MapField: []*examplesv1.LegacyMap_MapFieldEntry{{Key: "foo", Value: "bar"}}}
	json1, _ := protojson.Marshal(send1)
	fmt.Printf("legacy json: %s\n", json1)
	proto1, _ := proto.Marshal(send1)
	fmt.Printf("legacy wire: %x\n", proto1)

	// receive messages in our new code
	var rcv1, rcv2 examplesv1.ShortcutMap
	proto.Unmarshal(proto1, &rcv1)
	fmt.Println("receive 1:", rcv1.MapField) // prints the map as expected
	protojson.Unmarshal(json1, &rcv2) // proto: syntax error (line 1:13): unexpected token [
	fmt.Println("receive 2:", rcv2.MapField) // prints an empty map (because it couldn't parse)
} 

The inconsistency is that the legacy notation for maps can be decoded transparently into a message with the shortcut notation for the protowire format. But not when the json format is used.

Describe the solution you'd like I would like it to be possible that the protojson unmarshal logic allows for decoding maps from the legacy format into a message with the shortcut notation. For my use case it would be ok if this is an option in the https://pkg.go.dev/google.golang.org/protobuf/encoding/protojson#UnmarshalOptions struct but I do believe it can also be implemented transparently (without breaking any backwards compatibility)

Describe alternatives you've considered Some alternatives:

  • Modify the json message before it hits protojson.Unmarshal, but this requires parsing the whole message once, in a lossy format, then encoding it again into the json format that protojson accepts for our map. This is very ugly, and has big performance implications.
  • Implement a custom unmarshal method for protojson to just handle the map type (like https://pkg.go.dev/encoding/json#Unmarshaler). But this is impossible for protojson as far as I'm aware
  • Other options seems to require us mandating that the whole code base follows either one or the other way to define maps. This requires a big development undertaking
  • Disallow protojson encoding for messages that have a map, this defeats the purpose of protojson being a reliable message format.

Additional context To hopefully give this improvement a bit more weight I just want to emphasise:

  • That the official docs https://developers.google.com/protocol-buffers/docs/proto#maps mentions a lot that the map<T,T> notation is sugar for what is actually happening on the wire. The json representation doesn't follow that same rigor
  • It is pretty common for protojson to parse certain types in multiple ways (https://developers.google.com/protocol-buffers/docs/proto3#json), e.g floats can be send as 1.1 or "1.1". It seems OK if for maps it says that it would accept both {"foo": "bar"} as well as [{"key":"foo","value":"bar"}]
  • I would be able to contribute this improvement, if accepted

advdv avatar Jan 05 '23 19:01 advdv

This seems to be a request relevant to the protobuf standard itself. This module has been bitten before by going beyond the standards as they are defined, and thus is shy to implement something unilaterally that is not already standardized.

Namely: even if we make this change, will C++/Java/python support the same features? Having the Go implementation alone able to parse [{"key":"foo","value":"bar"}] into a map<string, string> means that the Go implementation will be unexpectedly different from other implementations.

puellanivis avatar Jan 06 '23 21:01 puellanivis

Right, that makes sense. But it do think it is fair to raise this with the standard itself (or at least the canonical json encoding)! What would be the right place to create an issue for that?

advdv avatar Jan 07 '23 08:01 advdv

As @puellanivis mentioned, the Go implementation will only adopt this if the wider protobuf project deems this as standard behavior.

The issue tracker for the wider protobuf project is https://github.com/protocolbuffers/protobuf/issues.

dsnet avatar Jan 07 '23 09:01 dsnet

@dsnet I understand. I've opened the following issue over here: https://github.com/protocolbuffers/protobuf/issues/11505

advdv avatar Jan 09 '23 16:01 advdv