prost icon indicating copy to clipboard operation
prost copied to clipboard

Map<_,type> appears to fail for zero types

Open UserAB1236872 opened this issue 6 years ago • 13 comments

I'll be honest, I'm not 100% certain if this is a prost issue of an issue with the Javascript library, but since I'm using the one included in Google's own tree:

https://github.com/google/protobuf/tree/master/js

I'm inclined to try here first. The issue is with the following code:

message Foo {
    map<string, int> bar = 1;
}

In Rust

let mut x = Foo::default();
x.bar.insert("val", 0);

Now encode and send it somewhere. Prost appears to only write the key, and not the subsequent value of 0. At least in Javascript, this causes a panic due to receiving a key with a missing value, leading me to have to use string types and conversions back and forth from numbers in order to bypass this.

This applies to all zero types, floats/doubles do this on 0.0, bools do this on false.

It's possible the specification allows this and the Javascript is in error, but again, I'm trying here first since I think this is more likely a prost bug.

UserAB1236872 avatar May 02 '18 09:05 UserAB1236872

The language guide says that map fields are treated as repeated key/value message types, so I believe prost is at least not wrong to be making this optimization. Have you tried reproducing this against one of the more mature implementations like C++ or Java?

danburkert avatar May 02 '18 15:05 danburkert

The map encoding code goes way out of it's way to do this optimization, so at some point in the past I decided this was an important optimization, although I can't remember why now. Perhaps it's checked in the conformance tests.

danburkert avatar May 02 '18 15:05 danburkert

With equivalent C++ code, C++ protobuf generates 0a070a0376616c1000 which does include the default value, so I do not think this is required for conformance (I assume C++ protobuf is conformant).

C++ protobuf also accepts 0a070a0368656a as input (without the encoded zero). I think prost is not wrong here.

per-gron avatar Jul 05 '18 14:07 per-gron

I'll close then

(Also, I don't believe C++ can be wrong. It's the reference implementation if I'm not mistaken?)

UserAB1236872 avatar Jul 07 '18 00:07 UserAB1236872

FWIW I'm still not convinced prost should be doing this optimization. It makes the code a bit uglier, and if it causes compat issues then I think we should consider removing it.

danburkert avatar Jul 08 '18 22:07 danburkert

A guess as to why you could have thought of this optimization (or whatever you'd like to call it) as important: Because of Prost's way to represent messages as plain Rust structs, if this optimization wouldn't be there, a proto message with many int fields would take noticeable amount of space when serialized even when the object is just Default::default(). This is not the case for Java or C++ default constructed messages.

Not saying that it has to be kept, but that seems like a somewhat useful feature to me.

per-gron avatar Jul 13 '18 12:07 per-gron

I looked into what it would look like to change the behavior so that map keys are always encoded. It ends up making some of the most complicated code in prost simpler: 07539742746b0d4b24e1153dbff0170b4a712478. I'm somewhat inclined to merge it given the simplifications and compat issues. I'd really like to see this corner case added to the upstream conformance test suite, since merging this means prost will no longer be able to test that it can correctly decode such maps (unless some examples are hardcoded).

I'm curious if y'all have thoughts.

danburkert avatar Aug 02 '18 06:08 danburkert

A guess as to why you could have thought of this optimization (or whatever you'd like to call it) as important: Because of Prost's way to represent messages as plain Rust structs, if this optimization wouldn't be there, a proto message with many int fields would take noticeable amount of space when serialized even when the object is just Default::default(). This is not the case for Java or C++ default constructed messages.

That's definitely the case for proto3 default visibility fields, and prost does the exact same optimization, but the code that does that is distinct from map encoding/decoding, so the optimization can be backed out for maps without affecting proto3 messages. My gut is that this optimization really doesn't matter for maps much, since it will be relatively rare to have default values. It's definitely important for proto3 default visibility fields, though, since those will be the default much more often.

danburkert avatar Aug 02 '18 06:08 danburkert

Fwiw, this breaks with the the following configuration: [server: tonic/prost] -> [envoy] -> [dart/flutter using grpc-web]. Dart experiences a Code 15 Unrecoverable data loss or corruption.

A go server implementing the same grpc service responds with the encoded 0 (confirmed with wireshark) for my map<int32, OtherMessage> field and is decoded properly. Using index +/- 1 on both sides is a good enough workaround for now.

brentnk avatar Apr 25 '21 20:04 brentnk

Prost appears to only write the key, and not the subsequent value of 0.

This is the right thing to do according to proto spec: https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md#presence-in-tag-value-stream-wire-format-serialization

When serializing, fields with no presence are not serialized if they contain their default value.

Encoding of map fields are specified in https://developers.google.com/protocol-buffers/docs/proto#maps

In proto syntax, if you have a map field like

map<int32, MyMapValue> my_map_field = 123;

It's encoded as if it was

repeated MyMapFieldEntry my_map_field = 123;

where

message MyMapFieldEntry {
    optional int32 key = 1;
    optional MyMapValue value = 2;
}

In the original example, map value type is int (which is not a valid proto type, I'm guessing the OP meant one of the valid integer types like int32 or int64), and because the default value for integer types is 0, prost is doing the right thing by omitting the value. If the receiving end fails to read the message then it's a bug in the protobuf implementation running on the receiving end.

For Dart protobuf implementation, we fix the decoding for map fields to handle this case in https://github.com/google/protobuf.dart/pull/745.

Note that since both key and value fields of map entries are optional, it's also possible to omit the key field. For example

message Foo {
    map<int32, int32> bar = 1;
}
let mut x = Foo::default();
x.bar.insert(0, 0);

A spec compliant implementation should encode x as

[
    (1 << 3) | 2, // tag = 1, wire type = length delimited
    0, // length = 0, i.e. empty entry message
]

osa1 avatar Aug 26 '22 13:08 osa1

@osa1 thanks for the explanation, so I guess we can close this now since this isn't really a bug with prost?

LucioFranco avatar Aug 26 '22 14:08 LucioFranco

Thinking more about this, I'm not as confident now that this is not a bug.

In the spec I quoted in my comment above:

When serializing, fields with no presence are not serialized if they contain their default value.

Here I missed the "fields with no presence" part. optional enables presence. So in this encoding shown in https://developers.google.com/protocol-buffers/docs/proto#maps:

message MapFieldEntry {
  optional key_type key = 1;
  optional value_type value = 2;
}

repeated MapFieldEntry map_field = N;

Not encoding a default value (in key or value fields) is actually different than encoding it, because key and value fields are optional, so they have presence checks. (i.e. in addition to reading their values, you can check whether they exist in a message)

Now, the question is what happens on the receiving end when you get a map entry with missing key, value, or both key and value fields, and you represent proto map fields as actual an actual map (rather than a list of entries) in your language. Protobuf spec does not mention this. However, the behavior should be consistent with the "desugaring" shown above (where map is represented as repeated MapFieldEntry), and for that we need to use the default values of key and value fields when they are missing, because that's what we would do if the map field was actually repeated MapFieldEntry.

Unfortunately, protobuf spec does not specify what the default values should be for optional Message fields. I did a survey of this a while ago: https://github.com/google/protobuf.dart/issues/309#issuecomment-1173756840 you can see that different implementations return different values.

What this means is, if you have a map<int32, MyMessage> field, and omit a MyMessage value because it's the same as default message, on the receiving end, some implementations will read the map entry as 123 => null, others read it as 123 => MyMessage() (i.e. value is empty/default message).

To avoid this problem prost may want to serialize the value fields always. I'm not sure if not doing it is a bug or not.

osa1 avatar Aug 26 '22 14:08 osa1

I asked this to protobuf team (in context of https://github.com/google/protobuf.dart/issues/309).

It's fine if an implementation omits key or value of a map entry when the value is the default for the field type.

So prost's behavior seems fine, and if an implementation cannot handle it then it's a bug in that implementation. (this is what we fix in the PR linked above)

That said, official protobuf implementation will never omit key or value fields, even when they have the default value for the type: https://github.com/protocolbuffers/protobuf/blob/8d3a7327606b115dda871be54c8b8cb66d41ff90/src/google/protobuf/wire_format.cc#L1184-L1203

I'm guessing other protobuf implementation by the protobuf team (e.g. the JS one linked in the bug report) also do the same, and that's why they cannot handle when key or value fields are missing. When sending they always serialize those fields so the case where one or both of the fields are missing is not considered and tested.

osa1 avatar Sep 01 '22 08:09 osa1

Ok thanks for the update, I will close this issue since the implementation seems correct.

LucioFranco avatar Nov 02 '22 19:11 LucioFranco