protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Protobuf messages contains Any aren't serialized deterministically even specifying so

Open lizan opened this issue 6 years ago • 4 comments

What version of protobuf and what language are you using? Version: master (currently at 7492b5681231c79f0265793fa57dc780ae2481d6) Language: C++

What operating system (Linux, Windows, ...) and version? Linux

What runtime / compiler are you using (e.g., python version or gcc version) gcc-7 / clang-7

What did you do? Deterministic serialize a message and calculate hash based on the serialized binary. Source: https://github.com/envoyproxy/envoy/blob/master/source/common/protobuf/utility.h#L162

What did you expect to see Same proto generates same serialized binary and hash.

What did you see instead? Same proto generates different serialized binary and hash.

Anything else we should know about your project / environment This is a follow up of #5668, even we use the CodedOutputStream with SetSerializationDeterministic(true), the same protobuf message (from JSON debug dump) doesn't produce same binary serialization. My suspect is that the Any in the message has different value while they are same. Deterministic serialization should normalize value in Any too.

lizan avatar Feb 14 '19 22:02 lizan

@liujisi , any ideas? The fix is critical and your help is greatly appreciated.

wattli avatar Feb 15 '19 19:02 wattli

@acozzette any thoughts?

lizan avatar Feb 19 '19 21:02 lizan

@lizan This may be a hard problem to solve. The way Any was designed, during parsing and serialization we don't treat an Any field in a special way and we just treat its payload as an opaque blob. If we want to serialize it deterministically, we would probably need to parse and reserialize the Any payload during serialization. To do that we need to figure out what kind of message the Any is. We could do that by looking up the name in the generated descriptor pool, but that solution is not ideal since it won't work with lite protos (i.e. protos built without reflection support).

If you need a quick short-term fix, I think the best solution would be to have your hash function reflectively examine the proto and normalize all Any fields before doing the deterministic serialization of the full message. That may end up being slow, though. It might also be worthwhile to just avoid Any fields in messages that you want to be able to hash. Do you have a lot of Any fields or just a few?

acozzette avatar Feb 20 '19 00:02 acozzette

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] avatar May 05 '24 10:05 github-actions[bot]

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.

This issue was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

github-actions[bot] avatar May 19 '24 10:05 github-actions[bot]