jackson-dataformats-binary
jackson-dataformats-binary copied to clipboard
Parsing a protobuf message doesn't respect the message schema
Hi,
It seems to me that (at least in a scala setting), parsing protobuf messages doesn't faithfully respect the message schema. Consider the following protobuf parser, together with two data objects Greeting
and GreetingFrom
, and explicit protobuf schema definitions for both:
val mapper: ProtobufMapper = new ProtobufMapper with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
case class Greeting(author: String, message: String)
case class GreetingFrom(author: String)
val greetingSchema = ProtobufSchemaLoader.std.parse("message Greeting { required string author = 1; required string message = 2; }")
val greetingFromSchema = ProtobufSchemaLoader.std.parse("message GreetingFrom { required string author = 1; }")
Note how GreetingFrom
is basically a shorter variant of Greeting
. Note also how we explicitly require all fields to be present in the protobuf schema definitions.
Further suppose, we convert a GreetingFrom
message to protobuf as follows:
val greetingFrom = GreetingFrom(author = "Max")
val greetingFromProtobuf = mapper
.writerFor(classOf[GreetingFromMessage])
.`with`(greetingFromSchema)
.writeValueAsBytes(greetingFrom)
Then parsing of the greetingFromProtobuf
message succeeds even if we only want to accept messages for the different (and in fact, longer) schema for Greeting
:
val parsedMessage = mapper
.readerFor(classOf[Map[String,Any]])
.`with`(greetingSchema)
.readValue(greetingFromProtobuf)
In my opinion, parsing the greetingFromProtobuf
message according to the different (!) schema in greetingSchema
should fail loudly: greetingFromProtobuf
misses the message
field which is required by greetingSchema
, and thus it violates the protobuf schema definition for message Greeting
. Hence it should NOT be successfully parsed
However, the above parsing succeeds without any errors. Why is this? Is this a bug in the jackson-data-format-protobuf
library? (I'm using version 2.9.4
) Or am I using the parser incorrectly?
If people with more jackson (and protobuf inside jackson) knowledge could shed some light on this, I'd highly appreciate it! :-)
Perhaps a misunderstanding here is that protoc schema is strictly used for decoding (and encoding), but it has literally no effect on what happens above token stream: databind is free to allow/require that stream to map (or not) into POJOs.
So: encoded payload absolutely must conform to schema and will be massaged as necessary as per schema -- similar to Avro handling. But databinding has no knowledge of possible constraints.
This is a feature more than flaw, to allow for more flexible handling, conversions (for example, from protobuf messages to Map
s, and vice versa), although is also something that Jackson model results in and would be difficult to change.
hmm, I'm probably misunderstanding something, but in the above scenario where the protobuf message doesn't contain all the fields which are required by the schema, shouldn't parsing at the token stream level fail already? The schema enforces constraints (namely it requires the field "message" to be present in this case), which the protobuf message fails to provide. So with this behavior I don't see how the protobuf schema is strictly enforced.
Or do you mean that the enforcement of required fields can only be done later in the data binding step? If this is true, how could one parse a protobuf message into a Map[String, Any]
while enforcing the presence of the protobuf message's required fields in the map? I guess what I would like to know is if there is an idiomatic way in jackson do that or if one would need to enforce the schema's required fields manually when deserializing it into a value of Map[String, Any]
.
Thanks!
@bollmann Ah. I agree, requiredness is something that should be checked as per schema.
I missed that part. A simple unit test would be good, it can initially be added under failing
, and see what is happening.