protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Protobuf unnecessarily serializes default values when passed directly

Open alexprengere opened this issue 1 year ago • 1 comments

What language does this apply to?

This happens in both proto2 and proto3, using the latest revision of protoc and the Python protobuf library.

Describe the problem you are trying to solve.

Let's use this simple example test.proto:

syntax = "proto2";

message Person {
  optional string name = 1;
  optional int32 id = 2;
  optional bool alive = 3;
}

After compilation:

protoc --python_out=. test.proto

Then in python:

>>> from test_pb2 import Person
>>> Person().name
''
>>> Person().id
0
>>> Person().alive
False
>>> str(Person())
''
>>> Person().SerializeToString()
b''

Everything is great! Now let's pass explicitly the default values:

>>> str(Person(name='', id=0, alive=False))
'name: ""\nid: 0\nalive: false\n'
>>> Person(name='', id=0, alive=False).SerializeToString()
b'\n\x00\x10\x00\x18\x00'

Oh no! Why the different behavior here? We are just feeding back the default values to the object. The expected behavior would be to have the same output as Person(). The way to do this is to pass None everywhere, which are converted the default values:

>>> Person(name=None).name
''
>>> str(Person(name=None))
''

This actually goes beyond the serialization issue, and affects equality:

>>> Person() == Person()
True
>>> Person() == Person(name=None, id=None, alive=None)
True
>>> Person() == Person(name='', id=0, alive=False)
False

Describe the solution you'd like

What would be great is this:

>>> Person(name='', id=0, alive=False).SerializeToString()
b''
>>> Person() == Person(name='', id=0, alive=False)
True

alexprengere avatar Feb 20 '24 10:02 alexprengere

@anandolee can you take a look?

honglooker avatar Feb 20 '24 18:02 honglooker

Replying to myself after a little docs reading session.

It looks this is on purpose, and the "expected semantics" solving the above issue are the "no presence" ones. Relevant extract:

Presence disciplines define the semantics for translating between the API representation and the serialized representation. The no presence discipline relies upon the field value itself to make decisions at (de)serialization time, while the explicit presence discipline relies upon the explicit tracking state instead.

AFAICT "no presence" works only on proto3, so by modifying my previous example to use proto3 and removing the optional:

syntax = "proto3";

message Person {
  string name = 1;
  int32 id = 2;
  bool alive = 3;
}

Everything works "as expected":

>>> from test_pb2 import Person
>>> str(Person(name='', id=0, alive=False))
''
>>> Person(name='', id=0, alive=False).SerializeToString()
b''
>>> Person() == Person(name='', id=0, alive=False)
True

Unless I missed something else, this can be closed. It would be nice to have some kind of flag to force the "no presence" semantics for proto2 though.

alexprengere avatar Feb 26 '24 09:02 alexprengere

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] avatar May 26 '24 10:05 github-actions[bot]

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.

This issue was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

github-actions[bot] avatar Jun 10 '24 10:06 github-actions[bot]