jackson-dataformats-binary icon indicating copy to clipboard operation
jackson-dataformats-binary copied to clipboard

Parsing a protobuf message doesn't respect the message schema

Open bollmann opened this issue 6 years ago • 3 comments

Hi,

It seems to me that (at least in a scala setting), parsing protobuf messages doesn't faithfully respect the message schema. Consider the following protobuf parser, together with two data objects Greeting and GreetingFrom, and explicit protobuf schema definitions for both:

val mapper: ProtobufMapper = new ProtobufMapper with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)

case class Greeting(author: String, message: String)
case class GreetingFrom(author: String)

val greetingSchema = ProtobufSchemaLoader.std.parse("message Greeting { required string author = 1; required string message = 2; }")
val greetingFromSchema = ProtobufSchemaLoader.std.parse("message GreetingFrom { required string author = 1; }")

Note how GreetingFrom is basically a shorter variant of Greeting. Note also how we explicitly require all fields to be present in the protobuf schema definitions.

Further suppose, we convert a GreetingFrom message to protobuf as follows:

val greetingFrom = GreetingFrom(author = "Max")
val greetingFromProtobuf = mapper
  .writerFor(classOf[GreetingFromMessage])
  .`with`(greetingFromSchema)
  .writeValueAsBytes(greetingFrom)

Then parsing of the greetingFromProtobuf message succeeds even if we only want to accept messages for the different (and in fact, longer) schema for Greeting:

val parsedMessage = mapper
  .readerFor(classOf[Map[String,Any]])
  .`with`(greetingSchema)
  .readValue(greetingFromProtobuf)

In my opinion, parsing the greetingFromProtobuf message according to the different (!) schema in greetingSchema should fail loudly: greetingFromProtobuf misses the message field which is required by greetingSchema, and thus it violates the protobuf schema definition for message Greeting. Hence it should NOT be successfully parsed

However, the above parsing succeeds without any errors. Why is this? Is this a bug in the jackson-data-format-protobuf library? (I'm using version 2.9.4) Or am I using the parser incorrectly?

If people with more jackson (and protobuf inside jackson) knowledge could shed some light on this, I'd highly appreciate it! :-)

bollmann avatar Mar 19 '18 09:03 bollmann

Perhaps a misunderstanding here is that protoc schema is strictly used for decoding (and encoding), but it has literally no effect on what happens above token stream: databind is free to allow/require that stream to map (or not) into POJOs.

So: encoded payload absolutely must conform to schema and will be massaged as necessary as per schema -- similar to Avro handling. But databinding has no knowledge of possible constraints. This is a feature more than flaw, to allow for more flexible handling, conversions (for example, from protobuf messages to Maps, and vice versa), although is also something that Jackson model results in and would be difficult to change.

cowtowncoder avatar Mar 19 '18 17:03 cowtowncoder

hmm, I'm probably misunderstanding something, but in the above scenario where the protobuf message doesn't contain all the fields which are required by the schema, shouldn't parsing at the token stream level fail already? The schema enforces constraints (namely it requires the field "message" to be present in this case), which the protobuf message fails to provide. So with this behavior I don't see how the protobuf schema is strictly enforced.

Or do you mean that the enforcement of required fields can only be done later in the data binding step? If this is true, how could one parse a protobuf message into a Map[String, Any] while enforcing the presence of the protobuf message's required fields in the map? I guess what I would like to know is if there is an idiomatic way in jackson do that or if one would need to enforce the schema's required fields manually when deserializing it into a value of Map[String, Any].

Thanks!

bollmann avatar Mar 20 '18 10:03 bollmann

@bollmann Ah. I agree, requiredness is something that should be checked as per schema. I missed that part. A simple unit test would be good, it can initially be added under failing, and see what is happening.

cowtowncoder avatar Mar 20 '18 14:03 cowtowncoder