protobuf-java-format
protobuf-java-format copied to clipboard
Unknown fields not being repopulated upon de-serialization from json
If I have a protobuf message with unknown fields:
leaf1: "Hello"
leaf2: 23
leaf3: 41
4: "world"
where leaf1, leaf2, and leaf3 are my known fields, and 4 is an unknown field. I then convert to json like so:
return new JsonFormat().printToString(protobuf);
I get the following:
{"leaf1": "Hello","leaf2": 23,"leaf3": [41], "4": ["world"]}`
However, if I then deserialize the json back like so:
public static <T extends Message> T convertToProtobuf(String json, T defaultInstance)
throws ParseException {
Builder builder = defaultInstance.newBuilderForType();
JsonFormat format = new JsonFormat();
format.merge(json, ExtensionRegistry.getEmptyRegistry(), builder);
return (T)builder.build();
}
then my result is the following:
leaf1: "Hello"
leaf2: 23
leaf3: 41
The problem is field 4 is not present in the reconstituted proto as an unknown field. Am I doing something wrong, or is this not supported? Thanks!
@PeteyPabPro - it seems that your observation is correct - neither JsonFormat nor JsonJacksonFormat seem to slurp the unknown fields into the Builder/Message as they could/should.
I wrote a simple unit tests (disabled in the pull request) that shows this off. In the future, that's something you can/should do too - write a simple test along with minimal protobuf so that a fix can be written against the test as a specification.
I don't have time to fix this - would you be able to give your :+1: to PR #24, if not also enable & write a fix for it?
I'll take a look..
Hmm… I was just looking at JsonFormat vs. JsonJacksonFormat and found that they are different :frowning: This might not be a problem if you always used the same one on both ends, but I don't think that's fair to assume, and furthermore, JsonJackson has other issues, specifically with Groups, where it prints a JSON object, but expects a binary delimited ByteString on input/merge.
I'm not sure there's a spec for JSON, but the two JSON-related formatters should agree with one another.
For your complex example, here is the output from JsonFormat:
{
"10": [
"\b\u0006\u0012\u0005\rff\uffa6\uffbf",
"\b\uff92\uffff\uffff\uffff\uffff\uffff\uffff\uffff\uffff\u0001\u0012\u0005\r\u0000\u0000\u0000\u0000"
],
"2": [
"world"
],
"3": [
"I",
"am"
],
"4": [
"\b\u001e\u0012\u0005\r\uff9a\uff99\uff99?"
],
"5": [
51232271120233
],
"6": [
6
],
"7": [
1075000115
],
"8": [
4614253070214989087
],
"9": [
{
"1": [
"hi"
],
"2": [
1084647014
],
"3": [
23
],
"4": [
44232993922327
]
}
],
"knownfield": "hello"
}
And here is the output from JsonJacksonFormat:
{
"10": [
"CAYSBQ1mZqa/",
"CJL//////////wESBQ0AAAAA"
],
"2": [
"d29ybGQ="
],
"3": [
"SQ==",
"YW0="
],
"4": [
"CB4SBQ2amZk/"
],
"5": [
51232271120233
],
"6": [
6
],
"7": [
1075000115
],
"8": [
4614253070214989087
],
"9": [
{
"1": [
"aGk="
],
"2": [
1084647014
],
"3": [
23
],
"4": [
44232993922327
]
}
],
"knownfield": "hello"
}
Hmm.. So do you think
- My fix to JSONFormat
- Fixing JSONJacksonFormat to be consistent with itself and JSONFormat
- Fixing JSONJacksonFormat unknown field parsing
should all be part of my PR? Or is it ok to just stick with (1) for this PR, and then do the Jackson stuff in another one?
Good questions - I acknowledge that my observation didn't give opinion on next steps, so let me think a moment.
I think we need to understand & document if possible what the format should be for JSON - It's unfortunate - I don't see wiki enabled for this, but we can add the gh-pages branch for documenting things…