zig-protobuf icon indicating copy to clipboard operation
zig-protobuf copied to clipboard

fix(json): don't emit defaut fields

Open librolibro opened this issue 1 year ago • 11 comments

Closes #64

For now there are failing tests, proper implementation is in progress)

P.S. If I looked carefully into generated structs - all fields (expect ArrayLists which should be empty ones after MessageMixins.init) have default values (even proto2 defaults are supported), so it shouldn't be a hard task

librolibro avatar Aug 06 '24 15:08 librolibro

  • [x] We should not emit empty repeated fields
  • [x] We should not emit empty string
  • [x] We should not emit empty byte arrays (bytes type)
  • [x] Check if negative zero is ALWAYS emitting (see here)
  • [x] .OneOf's with default values is ALWAYS emitting
// Wrong
{
  "oneof_field": {}
}

// Even more wrong
{}

// Right
{
  "oneof_field": {
    "active_oneof_field": 0
  }
}

Looks like everything is working as expected but I'd like to add a bit more tests to be sure

P.S. I completely forgot about proto2 defaults... Need to reimplement need_to_emit_numeric function

librolibro avatar Aug 07 '24 15:08 librolibro

I read the proto2 specification several times and I still don't understand that's default field values for:

  • Optional field with default value: what's the difference between this one and optional field without default value (in case of deserialization)? If I understand correctly, any value != null should be emitted both in protobuf bytes/JSON string serialization (e.g. zeros, empty strings/bytes etc.). I understand that default value will be the initial value of field after pb_init() - is there any meaning for it other than that? When value is absent in serialized bytes/JSON string - does it mean than value==null and that field was not set? Or it should it be equal to default value - in this case, wth is the purpose of this if I can't signal that value is absent because when it's not on the wire it will not be equal to null but to default?
  • Required field with default value: there is different logic I suppose, e.g. if numeric field with default=5 exists than if will not be serialized if value=5 (but 0 should be serialized), and when deserializing if value is absent it will be equal to 5 (because pb_init set it and there were no changes)

Am I understanding that right?

librolibro avatar Aug 08 '24 13:08 librolibro

I'll look into details when I can, but don't bother too much with proto2 options. This library mostly focuses on proto3.

Thanks again for everything.

Arwalk avatar Aug 08 '24 19:08 Arwalk

I wrote this simple Python code to check how they're handling these optional proto2 fields (took tests/protos_for_test/jspb.proto definition):

import sys

from google.protobuf.json_format import MessageToJson
from google.protobuf.json_format import Parse as JsonToMessage
from google.protobuf.message import Message

from jspb_pb2 import DefaultValues

if sys.stdout.isatty():
    ansi_magenta = "\x1b[1;35m"
    ansi_reset = "\x1b[0m"
else:
    ansi_magenta = ansi_reset = ""


def print_header(s: str) -> None:
    print("-" * len(s))
    print(ansi_magenta, s, ansi_reset, sep="")


def print_msg(msg: Message, indent: int = 0) -> None:
    msg.DESCRIPTOR.fields
    for field_descriptor in msg.DESCRIPTOR.fields:
        field_name = field_descriptor.name
        print(field_name, ":", sep="", end=" ")
        if isinstance(field_name, Message):
            print_msg(field_name, indent=indent + 2)
        else:
            print(getattr(msg, field_name))
    print()


def print_msg_jsons(msg: Message) -> None:
    for arg in (True, False):
        print("always_print_fields_with_no_presence = ", arg, ":", sep="")
        print(MessageToJson(msg, always_print_fields_with_no_presence=arg, indent=2))
    print()


def print_all(msg: Message) -> None:
    print_msg(msg)
    print_msg_jsons(msg)
    pb_msg = msg.SerializeToString()
    print("Pb serialized message length:", len(pb_msg), end="")
    if pb_msg:
        print(", message itself:", *map("{0:02X}".format, pb_msg))
    else:
        print()


def main_encode() -> None:
    msg = DefaultValues()
    print_header("No arguments were passed to the constructor")
    print_all(msg)

    print_header("Setting *int_field* = default (11)")
    msg.int_field = 11
    print_all(msg)

    print_header("Setting *int_field* = 10")
    msg.int_field = 10
    print_all(msg)


def main_decode() -> None:
    json_string1 = "{}"
    msg = JsonToMessage(json_string1, DefaultValues())
    print_header(f"Parse from {json_string1}")
    print_all(msg)

    json_string2 = '{"intField": 11}'
    msg = JsonToMessage(json_string2, DefaultValues())
    print_header(f"Parse from {json_string2}")
    print_all(msg)


    json_string3 = '{"intField": 10}'
    msg = JsonToMessage(json_string3, DefaultValues())
    print_header(f"Parse from {json_string3}")
    print_all(msg)


if __name__ == "__main__":
    main_encode()
    main_decode()

There is the output:

-------------------------------------------
No arguments were passed to the constructor
string_field: default<>'"abc
bool_field: True
int_field: 11
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{}
always_print_fields_with_no_presence = False:
{}

Pb serialized message length: 0
----------------------------------
Setting *int_field* = default (11)
string_field: default<>'"abc
bool_field: True
int_field: 11
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{
  "intField": "11"
}
always_print_fields_with_no_presence = False:
{
  "intField": "11"
}

Pb serialized message length: 2, message itself: 18 0B
------------------------
Setting *int_field* = 10
string_field: default<>'"abc
bool_field: True
int_field: 10
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{
  "intField": "10"
}
always_print_fields_with_no_presence = False:
{
  "intField": "10"
}

Pb serialized message length: 2, message itself: 18 0A
-------------
Parse from {}
string_field: default<>'"abc
bool_field: True
int_field: 11
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{}
always_print_fields_with_no_presence = False:
{}

Pb serialized message length: 0
---------------------------
Parse from {"intField": 11}
string_field: default<>'"abc
bool_field: True
int_field: 11
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{
  "intField": "11"
}
always_print_fields_with_no_presence = False:
{
  "intField": "11"
}

Pb serialized message length: 2, message itself: 18 0B
---------------------------
Parse from {"intField": 10}
string_field: default<>'"abc
bool_field: True
int_field: 10
enum_field: 13
empty_field:
bytes_field: b'moo'

always_print_fields_with_no_presence = True:
{
  "intField": "10"
}
always_print_fields_with_no_presence = False:
{
  "intField": "10"
}

Pb serialized message length: 2, message itself: 18 0A

It confuses me even more, I just don't understand the logic behind it: if the optional field is not explicitly set in constructor then it's equal to default value but it will not be serialized, but if I explicitly set this field to the default value it will be serialized? Why?

Also, before I wrote this simple test I expected proto2 optional fields to have a type "Something | None" (if you're not familiar with Python, it's kind of a .Union but just for typing) but there was just "Something". Maybe we also don't need to make these fields as .Optional's since even builtin protoc parsers like Python's doesn't make it that way? Another benefit could be easier field access without extra .? thing.

While you're saying that this project is mainly aiming at proto3 support, proto2 support shouldn't be that hard (in case of JSON (de)serialization at least) but these misunderstanding are irritating and prevent this from happening.

librolibro avatar Aug 11 '24 04:08 librolibro

Hey there, i understand your frustration. And i think i found the source of it!

It's actually not your fault that you can't find a way to make this work with proto2. Even google doesn't!

image

The language guide for proto2 and proto3 have respectively both a section for json options, here for proto 2 and there for proto 3. And as you can see in the screenshot, google itself is explicitely not compliant for proto 2!

So don't hurt yourself too much, do the same thing as python's implementation. This is the official implementation after all, if anybody complains, we'll point them to this.

Remember that this library is at the start a personal project of mine that happened to become the first viable (to my knowledge) implementation of protobuf for zig.

As a contributor i expect you to mostly do things for fun, or personal interest in making it work for your use cases.

Take care friend, don't hesitake to take a break.

I haven't looked into all the details of what you did yet, i'll check it out if I find some time for it.

Arwalk avatar Aug 11 '24 10:08 Arwalk

hey there @librolibro , where are you on this? Do you have some plans with it?

I have no desire to force you to finish it if you do not want to. If you want to leave it as is, for any reason whatsoever, just tell me.

Thank you for your work already.

Arwalk avatar Sep 15 '24 18:09 Arwalk

Yeah, many things happened and I almost forgot about this project :) I'll take a look on this week, need to remember what did I do and why did I stuck.

I also remember that I wanted to implement emittion options (camel or snake case, emit default value or not etc.) - I still remember the idea I had for that so I'll also take a look on this.

I just switched my focus to more interesting and much less useful project - and I almost forgot why did I want JSON serialization in the first place, thanks for reminding me :)

librolibro avatar Sep 16 '24 01:09 librolibro

It's OK. Take care.

Arwalk avatar Sep 16 '24 05:09 Arwalk

I'd like to ask you (something I mentioned in previous messages) before I'll read some code: do we really need .Union's in generated code? If optional/required fields are only about presence when emitting (to make pb messages or JSON strings more compact) then there's no need in this. I could skip something important but for now I think they're useless :) For the end-users it's also better to not have these optionals because you'll not need to type extra .? over and over again.

Back again to Python's implementation which I'm familiar with - I used protoc to generate Python code from all proto files I found in this project (with optionals and without, both proto2 and proto3) and these were no optionals in generated code (I didn't find Optional[Something] or Something | None in generated *.pyi files) - yes, constructors may accept None for most fields but after initialization these values will have the target type (and default values).

P.S. For people already using this project it will be a breaking change for sure.

P.P.S. Yeah, looks like I was wrong ... For proto2-required fields if it's null then field wasn't initialized and should fail on serializing. And for proto3-optional fields if it's null then field wasn't initialized but shouldn't fail on serializing (default will be used). By this logic proto3-optional may be null and it will be serialized by uisng default value, but in code you probably want to get default value when accessing this uninitialized field and not useless null. Looking at the C++ generated code they're not accessing fields directly but using methods like set_name and get_name - and if I understand correctly get_name will return the default value if the field wasn't initialized.

librolibro avatar Sep 17 '24 15:09 librolibro

I'd like to ask you (something I mentioned in previous messages) before I'll read some code: do we really need .Union's in generated code?

Unions here are not used for optionals but for oneOf fields.

If optional/required fields are only about presence when emitting (to make pb messages or JSON strings more compact) then there's no need in this. I could skip something important but for now I think they're useless :) For the end-users it's also better to not have these optionals because you'll not need to type extra .? over and over again.

The subject is a bit more complicated, and the concept of "optional value" in zig do not corresponds to the concept of "optional field" in proto3.

Here is the excerpt from the spec for proto 3:

image

What can be understood here is that:

  • default field with no labels are closer to the concept of "optional value" in zig: when parsing, either this field is present, or it is not.
  • fields with optional are either set to a specific value, or to their default value.

In other programming languages, this would be the difference between a nullable parameter, and a parameter with a default value. Example in python:

def message(field_with_no_label : Optional[Int], field_with_optional_label = 1 : Int):
    pass

The truth is that we miss the "you can check to see if the value was explicitly set" in our implementation so far, but this will have to be through an optional at some point, so i think we'll have to keep them.

P.P.S. blahblah

This repository's aim is to support proto3, do not bother with proto2. I'd invite you to not even look at the spec to avoid confusing yourself.

Thanks a lot, at no point I meant to be harsh in my comments here but it's sometimes hard to express things in a friendly manner through text only. I'm thankful for your interest and work in many ways.

Arwalk avatar Sep 18 '24 17:09 Arwalk

For now I made that if struct field has the default value (no matter if it's proto2 or proto3 default created by code generation) than I will compare with it - if default is not present (or null) then I respect proto3 specs. If field's value is null then it will not be serialized by any means (that mean than value wasn't initialized and shouldn't be in JSON).

Earlier I rarely used protobuf, mostly Python's implementation, and even after working at this project I think I'm still missing some key points - so sorry for bothering you about that technical details (it's definitely less confusing for me now :)).

After all proto2 is somehow supported but without any guarantees.

Unions here are not used for optionals but for oneOf fields.

I meant .Optional's here. Sometimes both the language barrier and my inattention to some details prevent me from properly checking whether I wrote what I wanted to say or not :)

librolibro avatar Sep 19 '24 08:09 librolibro

i'm closing this as i kinda lost the track of this subject. The code has evolved a bit since. Thank you @librolibro , no ill will at all on you being unable to finish this.

Arwalk avatar Oct 01 '25 17:10 Arwalk