confluent-schema-registry icon indicating copy to clipboard operation
confluent-schema-registry copied to clipboard

Implement support for Protocol Buffer references (imports) -- continued

Open erwinw opened this issue 2 years ago • 2 comments

This PR is a continuation of #135 (which can now be closed) [^1]

What:

Add support for Protocol Buffer imports and schema references.

I believe this is a feature change without any breaking changes.

Why:

Imports are very useful when writing schema files using Protocol Buffers. And in order to register a schema that uses imports, we need to use Schema Register's references feature to let it know where to find those imported types. Otherwise, it fails to register the schema.

Previous work:

I believe this relates to #82 (and obviously builds on #135).

There are two other related PRs open in #92 and #121. As far as I can tell, the first extends getSchema to retrieve references from Schema Registry. The second extends register() by asking the user to provide the reference schema ids in userOpts. This PR attempts to extend register() to resolve references by looking them up (either directly by using the import name, or by mapping the imported name to a subject and optionally providing a specific version), and creating a Protocol Buffer schema that includes these.

How:

When registering a new schema, the register method now supports an optional user-provided importSubjects helper that maps import references to subjects. If not provided, the import reference is used directly as the subject (which is inline with the Java behaviour).

The new referencedSchemas SchemaHelper then fetches definitions of the imported schemas. This PR implements this helper for Protocol Buffers, but it could also be implemented for at least JSON schemas.

   referencedSchemas(schema: string): Promise<string[]>

The register method is then extended to use these helpers to get the schema references needed and verifies those are all registered recursively and then included as the references of the newly registered schema. Schema registry references are a list of [(Subject, Version)] .

docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#referenced-schemas

To get define and encode to work, I extended the ProtoSchema constructor so it can take references into account when building the schema instance. It fetches the Schema instances for the references and adds them to the Root context for the new Schema instance avoiding duplicates, so it has all the types it needs.

   constructor(schema: ConfluentSchema, opts?: ProtoOptions, references?: Schema[])

I then extended schemaFromConfluentSchema to pass through the referenced schema.

Testing

I have added tests to validate behaviour in a few cases:

  1. Register and encode protocol buffer v3 schema

  2. Register and encode protocol buffer v3 schema with import. The imported proto has an import itself, to check the recursive registration.

  3. Register and encode protocol buffer v3 schema with nested imports, to check for multiple recursive registrations.

TODO before merging

  1. Need to check whether we need to explicitly provide "common" imports and whether we can "cheat" and do so via the protobufjs library: https://github.com/protobufjs/protobuf.js/blob/master/tests/data/common.proto

Future work

In order to use this new functionality, we should implement support for Message Indexes. Since Protocol Buffers can contain multiple type definitions in a single schema file, each encoded Kafka message includes a Message Index that tells the consumer which Protocol Buffer Message type to use for parsing the message. This feature would change the binary wire format.

docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#wire-format

[^1]: The original PR, #135 was written by @brinchj while working at @pleo-io. He has since left, and I'm resuming his work now, but for lack of write access to the original PR have recreated it in a @pleo-oss fork (to avoid the same problem from reoccurring while allowing the @pleo-io colleagues to also contribute to this PR).

erwinw avatar Apr 22 '22 10:04 erwinw

Hi @maitsarv, regarding your comments on the original PR:

  1. It does not work with the { [SchemaType.PROTOBUF]: { messageName: 'CustomMessage' }} option. Tries to look for the messageName from every proto definition, should only be looking from the root.

I don't think this has anything to do with this PR, but is a pre-existing issue -- could that be correct?

  1. When defintion inherits some type through multiple different files, then it will get an error.

I've resolved this by deduplicating the definitions.

erwinw avatar Apr 22 '22 12:04 erwinw

Hi @Nevon , sorry for the direct ping. Could I get you (to get someone) to have a look at this, please?

erwinw avatar May 06 '22 16:05 erwinw