confluent-schema-registry
confluent-schema-registry copied to clipboard
Implement support for Protocol Buffer references (imports) -- continued
This PR is a continuation of #135 (which can now be closed) [^1]
What:
Add support for Protocol Buffer imports and schema references.
I believe this is a feature change without any breaking changes.
Why:
Imports are very useful when writing schema files using Protocol Buffers. And in order to register a schema that uses imports, we need to use Schema Register's references
feature to let it know where to find those imported types. Otherwise, it fails to register the schema.
Previous work:
I believe this relates to #82 (and obviously builds on #135).
There are two other related PRs open in #92 and #121. As far as I can tell, the first extends getSchema
to retrieve references from Schema Registry. The second extends register()
by asking the user to provide the reference schema ids in userOpts
. This PR attempts to extend register()
to resolve references by looking them up (either directly by using the import name, or by mapping the imported name to a subject and optionally providing a specific version), and creating a Protocol Buffer schema that includes these.
How:
When registering a new schema, the register
method now supports an optional user-provided importSubjects
helper that maps import references to subjects. If not provided, the import reference is used directly as the subject (which is inline with the Java behaviour).
The new referencedSchemas
SchemaHelper then fetches definitions of the imported schemas. This PR implements this helper for Protocol Buffers, but it could also be implemented for at least JSON schemas.
referencedSchemas(schema: string): Promise<string[]>
The register
method is then extended to use these helpers to get the schema references needed and verifies those are all registered recursively and then included as the references
of the newly registered schema. Schema registry references are a list of [(Subject, Version)]
.
docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#referenced-schemas
To get define
and encode
to work, I extended the ProtoSchema constructor so it can take references into account when building the schema instance. It fetches the Schema instances for the references and adds them to the Root context for the new Schema instance avoiding duplicates, so it has all the types it needs.
constructor(schema: ConfluentSchema, opts?: ProtoOptions, references?: Schema[])
I then extended schemaFromConfluentSchema
to pass through the referenced schema.
Testing
I have added tests to validate behaviour in a few cases:
-
Register and encode protocol buffer v3 schema
-
Register and encode protocol buffer v3 schema with import. The imported proto has an import itself, to check the recursive registration.
-
Register and encode protocol buffer v3 schema with nested imports, to check for multiple recursive registrations.
TODO before merging
- Need to check whether we need to explicitly provide "common" imports and whether we can "cheat" and do so via the
protobufjs
library: https://github.com/protobufjs/protobuf.js/blob/master/tests/data/common.proto
Future work
In order to use this new functionality, we should implement support for Message Indexes. Since Protocol Buffers can contain multiple type definitions in a single schema file, each encoded Kafka message includes a Message Index that tells the consumer which Protocol Buffer Message
type to use for parsing the message. This feature would change the binary wire format.
docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#wire-format
[^1]: The original PR, #135 was written by @brinchj while working at @pleo-io. He has since left, and I'm resuming his work now, but for lack of write access to the original PR have recreated it in a @pleo-oss fork (to avoid the same problem from reoccurring while allowing the @pleo-io colleagues to also contribute to this PR).
Hi @maitsarv, regarding your comments on the original PR:
- It does not work with the
{ [SchemaType.PROTOBUF]: { messageName: 'CustomMessage' }}
option. Tries to look for the messageName from every proto definition, should only be looking from the root.
I don't think this has anything to do with this PR, but is a pre-existing issue -- could that be correct?
- When defintion inherits some type through multiple different files, then it will get an error.
I've resolved this by deduplicating the definitions.
Hi @Nevon , sorry for the direct ping. Could I get you (to get someone) to have a look at this, please?