kafka-connect-bigquery
kafka-connect-bigquery copied to clipboard
Support topic with multiple schemas
I am trying to use this connector with the topic with multiple schemas. The problem is the subject name for schema is topic-record
where this code is hard coded to topic-value
. Are you going to support that kind of scenarios?
@mtagle If I would make the changes to add support for above , are you going to accept it?
I'm also ready to put some time & effort into preparing a pull request. It would be great to know that you @mtagle and @criccomini are not going to reject the whole idea.
I have to admit I'm a little unclear about what you mean by "a topic with multiple schemas". I'm particularly unclear on how a topic with multiple schemas would successfully end up in a table in BQ with (presumably) a single fixed schema. Could you elaborate on this? Maybe some details aobut your use-case would be helpful as well.
In our use case we are using an event sourcing approach where a single topic contains many different entities. We would like to store those entities in a dedicated BigQuery tables (one table for each event type) for further data ingestion / BI.
This approach is already supported by Kafka and Schema Registry (see https://www.confluent.io/blog/put-several-event-types-kafka-topic/).
We would add two new configuration options (names can be changed, this is just only draft):
-
recordNamesToTables
- A list of comma-delimited strings representing mapping between record names and table names (just liketopicsToTables
but for records). Default: empty list. -
multipleSchemaTopics
- A boolean feature switch. Default: false.
Table names would contain mapped topic and record name (separated by _
).
What do you think about the whole idea?
Thank you for the clarification, this makes a lot more sense to me now!
As the schema-registry already supports this, I'm in support of adding this feature to KCBQ. We'd be happy to review and merge your changes if you wanted to implement this.
Great! Expect a PR soon :)