confluent-schema-registry
confluent-schema-registry copied to clipboard
Adding Support for Schema References.
Namaste Team
In this PR:
- I am adding support for
SchemaReferencesusing a combination oftypeHookandlogicalType - This seems to work in out internal testing, but I'll need to figure out how to write unit tests for this.
- This issue is being talked in
avsc: https://github.com/mtth/avsc/issues/309 but they will not have a solution unless they would like to make api calls to the registry.
Issues Touched:
- #37
- #82
@whizzzkid, is there an ETA on when it will be possible to use schema references? :)
@alonpi we are using schemaReferences as of today based on my fork. I am not sure how to get this PR reviewed, they don't a have a contribution doc, nor am I able to assign reviewers for this PR. If @Nevon can take a look, we can go from there.
PS: I noticed the v2 release, will resolve the conflicts shortly.
I can try to help, but I have never heard of schema references until now, so I have some catching up to do. One thing I can say for sure is that any contribution that adds functionality like this needs to be tested. We generally favor integration tests, so if you can describe the expected usage, it should be pretty straightforward to write a test that does exactly that.
From my very brief reading of confluentinc/schema-registry#1280, this seems to be a feature that exists for not just Avro, but also JSON Schema and Protobuf. We haven't really decided on a policy of how to keep the different schema types in feature parity. Would you be able to drive this change also for Protobuf or JSON Schema? Without having delved into it, I don't know if doing this is a large or small effort.
@Nevon Schema References were added in v5.5 of Confluent Platform: https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#referenced-schemas
So essentially any newer implementation should support this. This was a blocker for our (AppDirect) kafka deployment, and this was the easiest fix. The ask for the support has been since November in #82.
I can look into implementing this change for protobuf and JSON schemas too and fix the test cases as you mentioned. Before that I will need to understand what changes does v2 bring in.
I'll get back to you in a few days!
@whizzzkid are you still working on this PR? We are trying to use the lib and would love to use schema references as part of schema creation through this package. Thanks for working on adding this feature.
@tejans24 this is ready to merge if @Nevon agrees. I can work on the protobuf/JSON ask in a seperate pr.
We published an internal version using my fork and seems to work well for us. So far no complaints received.
@tejans24 this is ready to merge if @Nevon agrees. I can work on the protobuf/JSON ask in a seperate pr.
We published an internal version using my fork and seems to work well for us. So far no complaints received.
@whizzzkid: As @Nevon mentioned earlier, is there way to help add some integration testing for this? I guess it can get slightly complex because of the nature of schema references.
@whizzzkid Thank you a lot for this schema reference AVRO contribution! I wonder is there a description of how to use your commit? How should I register my producer on a nesting schema, is it still
const registry = new SchemaRegistry({});
const kafka = new Kafka({});
const producer = kafka.producer();
const subject = "";
const version = 1;
const valueId = await registry.getRegistryId(subject, version);
producer.send({
topic: '',
messages: [
{
value: await registry.encode(valueId, payload)
},
],
});
? Thank you! I also noticed that the code based on which you modified on is not the current code in master branch. Thus I cannot simply copy paste the your "changed line" under my node_modules folder. Can you tell about which version of @kafkajs/confluent-schema-registry you used to write your fork branch?
@whizzzkid Hi! I couldn't figure out how to use your fork.
I mean, if I npm install git+https://github.com/whizzzkid/confluent-schema-registry.git\#feature/schema-references, your module seems to have a dependency compatibility issue. This command will give error as below:
npm ERR! code 1
npm ERR! git dep preparation failed
npm ERR! command /home/.nvm/versions/node/v15.1.0/bin/node /home/.nvm/versions/node/v15.1.0/lib/node_modules/npm/bin/npm-cli.js install --only=dev --prod --ignore-prepublish --no-progress --no-save --cache=/home/.npm/_cacache --prefer-offline=false --prefer-online=false --offline=false --before=
npm ERR! npm WARN invalid config before="" set in command line options
npm ERR! npm WARN invalid config Must be one of: null, valid Date string
npm ERR! npm ERR! code ERESOLVE
npm ERR! npm ERR! ERESOLVE unable to resolve dependency tree
npm ERR! npm ERR!
npm ERR! npm ERR! While resolving: @kafkajs/[email protected]
npm ERR! npm ERR! Found: [email protected]
npm ERR! npm ERR! node_modules/jest
npm ERR! npm ERR! dev jest@"^25.2.7" from the root project
npm ERR! npm ERR!
npm ERR! npm ERR! Could not resolve dependency:
npm ERR! npm ERR! peer jest@">=24 <25" from [email protected]
npm ERR! npm ERR! node_modules/ts-jest
npm ERR! npm ERR! dev ts-jest@"^24.0.2" from the root project
npm ERR! npm ERR!
npm ERR! npm ERR! Fix the upstream dependency conflict, or retry
npm ERR! npm ERR! this command with --force, or --legacy-peer-deps
npm ERR! npm ERR! to accept an incorrect (and potentially broken) dependency resolution.
npm ERR! npm ERR!
If I changed ts-jest version to 25.2.1, still when I try to const { SchemaRegistry } = require('@kafkajs/confluent-schema-registry');
There throws an error:
node:internal/modules/cjs/loader:335
throw err;
^
Error: Cannot find module '/home/dishaw/serial_number/confluent_kafka/node_modules/@kafkajs/confluent-schema-registry/dist/index.js'. Please verify that the package.json has a valid "main" entry
According to your reply above, your fork seems to work for you. Can you describe more about how you installed your fork? Thank you very much.
@300LiterPropofol that will not work because you need to build the module first, This project uses Azure to build the module. Internally we added a manual build step in our infrastructure to publish this to our own internal npm instance which works. You can build the fork locally and then use: npm i <local_path> to run this.
@300LiterPropofol that will not work because you need to build the module first, This project uses Azure to build the module. Internally we added a manual build step in our infrastructure to publish this to our own internal npm instance which works. You can build the fork locally and then use:
npm i <local_path>to run this.
Thank you very much for your response. @whizzzkid , sorry that I am not very familiar with nodejs. Could you explain more about how to build the packages locally?
@300LiterPropofol
- Clone the fork.
- Install Deps:
yarn install - Build:
yarn build - Use it in your project like:
npm i -S ../path/to/fork
@whizzzkid Thank you very much! I will give a try.
@whizzzkid Sorry to disturb you again. I tried to build the module locally and installed from local. I hit an error that
Error: undefined type name: selfdefinednamespace.selfdefinedname
at Function.Type.forSchema (/home/node_customize/confluent-schema-registry/node_modules/avsc/lib/types.js:177:11)
I wonder how deep your schema reference feature can go, I have a schema structure like:
schema A (refer to schema B)
schema B(refer to schema C)
schema C (refer to schema D)
schema D (refer to schema E)
and the error undefined type name: selfdefinednamespace.selfdefinedname is the fullname of schema E.
Is there any limitation about how deep the feature can retrieve? Thanks!
Update on the above comment: There is no limitation depth in this add-on feature. I finally found out what caused this kind of confusing error. For anyone who is upcoming and may be stuck in the same issue, in case you need it, To be more specific, I am using Confluent Cloud, so the schema created can be observed in the Confluent Cloud webpage UI. If you are using references, you can see three columns named "Reference name", "subject", "version" under the "Schema References" text block. The "Reference name" column MUST be filled with the fullname, i.e. namespace.name, of the referred schema. If you fill the column with something else. It will stilln pass the schema validation and probably do not have effect on the nesting schema registry producer (for example I was using python client, this field does not have impact.) But for whizzzkid's add-on, that field has to be the fullname. Now I have it working! Thank you a lot again for your amazing job!!
@300LiterPropofol There shouldn't be a limit on the depth of the schema, glad the naming worked out for you. We can fix the naming, but I am not sure if that would be a good idea. I am looking into fixing the 👇🏽 conflicts first.
Thanks for validating :)
Is there any update on this? We are really keen in using this library but all our Avro Schemas contain nested schemas and this is not currently supported
Just a heads up: I've started a discussion around references in the related issue here: https://github.com/kafkajs/confluent-schema-registry/issues/82
I think we are a bunch who are interested in this and I think we could team up and produce the changes needed together in uniform 👍
I guess this is dead by now, is it? :thinking:
@WhereIsMyRum this was working as is in production without any failures. Then I switched jobs, had some personal commitments and got covid, I hope it's still working there. I'll see if I can find sometime to get this through in the next few weeks, or will close this PR.