kafka
kafka copied to clipboard
KAFKA-7883 add schema.namespace support to SetSchemaMetadata SMT in Kafka Connect
When you use Kafka Connect and Avro Converter with Schema Registry you may need to transform your source schema namespace. Currently SetSchemaMetadata provide a way to transform the namespace by specifying in the schema.name attribute both actual namespace being concatenated with the actual schema name:
{
"transforms" : "TransformSchema",
"transforms.TransformSchema.type" : "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"transforms.TransformSchema.schema.name" : "my.new.namespace.NewSchemaName"
}
But this way you are forced to specify the schema name even when you want to preserve the original schema name. In some cases you just may need to replace possibly an empty namespace of your source with the actual namespace fitting "your.java.package" and leave schema name as it is.
Moreover as mentioned in KAFKA-7883, when use JDBC connector with multiple tables in the source database what you may want is to preserve the table names but specify some non-empty namespace for them.
To support this we have to provide a way to append a schema namespace to the schema name (doesn't matter being updated or not).
In this PR a new schema.namespace attribute is added to SetSchemaMetadata letting developer to decide either to transform namespace only or transform both schema name and schema namespace by using either schema.namespace together with schema.name or just single schema.name attribute. An example:
{
"transforms" : "TransformSchema",
"transforms.TransformSchema.type" : "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"transforms.TransformSchema.schema.namespace" : "my.new.namespace",
"transforms.TransformSchema.schema.name" : "NewSchemaName",
"transforms.TransformSchema.schema.version" : 42
}
@rhauch could you please review my PR?
I really want this! This is exactly what our team needs to resolve issues with Avro namespaces.
@kkonstantine could you please find a time to review my request? I believe it can be a useful feature and it should fill in the gaps in the current SetSchemaMetadata SMT API. Thank you!
For context, we have had to go from 2 connectors to a.... huge number of connectors as a workaround, which has been a crazy amount of configuration for us, based on how we currently use kafka connect with yaml configuration (think of strimzi or the kafka confluent k8s operator). All each overriding the entire Schema name and being explicit while also introducing table whitelisting.... This pull request is a fantastic solution that would allow us to go back to 2 connectors.
I've marked this as needing a KIP, though I'm not yet sure whether that is the case. These SMTs are technically part of the API, since they appear in STM configurations, which are considered part of the public API. I'll keep looking into this to figure out if this is really the case, but it's probably safer to assume a KIP is needed.
Would I be correct with the following example too?
{
"transforms" : "TransformSchema",
"transforms.TransformSchema.type" : "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"transforms.TransformSchema.schema.namespace" : "added.namespace"
}
So that the original Schema name is untouched?
I've marked this as needing a KIP, though I'm not yet sure whether that is the case. These SMTs are technically part of the API, since they appear in STM configurations, which are considered part of the public API. I'll keep looking into this to figure out if this is really the case, but it's probably safer to assume a KIP is needed.
@rhauch thank you for your review, I've really missed the fact I've broken current behavior on empty schema name! I've fixed this and added more corner cases unit-tests on the schema update. Also I've made an effort to prettify our tests to make them looking more uniform and readable.
@rhauch @mnegodaev this is looking pretty good now. Would others need to have reviewed this before being merged in? I'm not too clued in on the open source contributions here but it looks like what @mnegodaev has done is excellent work that may solve a lot of issues for people using kafka with avro. :)
@mjsax I can see you are recently doing some things, is there any way we can get @mnegodaev PR merged?
I am not familiar with Connect code -- @rhauch @kkonstantine @mimaison should be able to help.
@pjmagee Thanks for the contribution!
As @rhauch mentioned above SMTs are part of the API so in order to add new configurations we need a KIP.
The process is documented in https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals Considering it's a pretty small changes it should be a relatively straight forward KIP.
Let me know if you have any questions.
@mnegodaev It looks like this is something you would add to this PR a link to a KIP.
@mnegodaev It looks like this is something you would add to this PR a link to a KIP.
@pjmagee, I'll try to do may best. Thanks!
@rhauch @mimaison @kkonstantine @pjmagee FYI I've started a [DISCUSS] thread on the Apache mailing list for the new KIP-855.
@rhauch @mimaison @kkonstantine @pjmagee If you guys don't mind I'm going to wait till Wednesday and then call a [VOTE] to have the proposal adopted. Thanks!
@rhauch @mimaison @kkonstantine @pjmagee I've started a vote for the KIP-855: https://lists.apache.org/thread/b0cdofxhl7p84k14wvhtrhd0r1z8qzpk Please vote for it if you're agree.
@mimaison Hey, It looks like @mnegodaev has followed all the right procedures and we have the KIP. As someone with more experience on this repo, what are our next steps?
Hi @hachikuji Sorry to pull you in on this PR - I notice you have been active on some merged PR's quite recently. We have had some input from others regarding the process around creating a KIP, which is now completed for this feature, along with an explanation as to why @mnegodaev has added this feature, along with myself having a usecase for this feature. What would be the next steps in being able to get this merged? It seems like a elegant solution and change to the SMT to support namespaces. Would it be possible for you to maybe include others who may be able to get this merged? I am rather new to this repo so I am unaware of how best to proceed.
@pjmagee I commented in the discuss thread last week, see https://lists.apache.org/thread/92y0qqlwb75xqkxnzwbw7jywfdw8gs8k. Can you reply to my question there?
@pjmagee I commented in the discuss thread last week, see https://lists.apache.org/thread/92y0qqlwb75xqkxnzwbw7jywfdw8gs8k. Can you reply to my question there?
I've fixed this typo into KIP, thanks for finding this.
whats the status of this?
To be able to merge this, the associated KIP must be voted. In the discussion thread for the KIP, a few questions have not been answered yet by the author. Until these are answered, it's unlikely there will be any votes, so this will stay stuck.
Hi guys, I'll try to answer the questions in the near time, thanks.
ср, 15 февр. 2023 г. в 15:51, Mickael Maison @.***>:
To be able to merge this, the associated KIP https://cwiki.apache.org/confluence/display/KAFKA/KIP-855%3A+Add+schema.namespace+parameter+to+SetSchemaMetadata+SMT+in+Kafka+Connect must be voted. In the discussion thread https://lists.apache.org/thread/3hkd9lljobf9rl56ogjpcbo4ldoxcz5n for the KIP, a few questions have not been answered yet by the author. Until these are answered, it's unlikely there will be any votes, so this will stay stuck.
— Reply to this email directly, view it on GitHub https://github.com/apache/kafka/pull/11442#issuecomment-1431141433, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE2RTBY2JST3W745OFVSLJ3WXSYLTANCNFSM5G3C2JEA . You are receiving this because you were mentioned.Message ID: @.***>