kafka-connect-bigquery icon indicating copy to clipboard operation
kafka-connect-bigquery copied to clipboard

Add support for topics with multiple schemas

Open mkubala opened this issue 5 years ago • 17 comments

This contribution makes kafka-connect-bigquery compatible with topics that carry different types of messages, especially when Kafka is used for an event sourcing.

Approach of putting several event types in a single topic is already supported by Kafka and Schema Registry (see https://www.confluent.io/blog/put-several-event-types-kafka-topic/).

This PR has been derived from #219 and rebased on top of the recent changes made by @bingqinzhou

mkubala avatar Jan 09 '20 14:01 mkubala

@mtagle @criccomini @C0urante Any progress on this one?

jaroslawZawila avatar Jan 21 '20 09:01 jaroslawZawila

@mtagle @criccomini @C0urante Is there any chance for this feature to be merged and included in the next release?

I do not want to exert pressure on anyone here, but it's been almost 3 months since I started work on this contribution after receiving a positive feedback in #175 and my team is blocked with their work on ML & reporting.

I'm between a rock and a hard place and if I should start looking for another solution (write my own custom service or connector?) I'd like to know it before the team get really mad.

mkubala avatar Jan 23 '20 15:01 mkubala

hey @mkubala, I've asked @whynick1 to take a look at your PR. I'm eager to get this merged as well!

mtagle avatar Jan 23 '20 17:01 mtagle

Codecov Report

Merging #238 into master will increase coverage by 0.25%. The diff coverage is n/a.

@@             Coverage Diff              @@
##             master     #238      +/-   ##
============================================
+ Coverage     68.69%   68.95%   +0.25%     
- Complexity      267      273       +6     
============================================
  Files            32       32              
  Lines          1460     1498      +38     
  Branches        152      153       +1     
============================================
+ Hits           1003     1033      +30     
- Misses          409      417       +8     
  Partials         48       48              
Impacted Files Coverage Δ Complexity Δ
...a/connect/bigquery/utils/TopicToTableResolver.java 73.33% <0.00%> (-18.34%) 11.00% <0.00%> (-1.00%)

skyzyx avatar Jan 30 '20 11:01 skyzyx

Most of the comments have been addressed. I left some food for thoughts regarding moving the getSingleMatch method back from config to the TopicToTableResolver.

Also I'd be glad if you could take a look on #237 - it's another feature, essential to start using multi schema topics. When they finally get merged, I'll expose a third PR, which brings support for resolving datasets based on the topic and record names.

mkubala avatar Jan 30 '20 13:01 mkubala

@mkubala are you planning to update this with TableRouter?

criccomini avatar Feb 10 '20 17:02 criccomini

Hi @criccomini! Sorry for late response. I cannot tell you if and when I'll find time to update this PR.

I've been working on it as a part of my daily job and since we are behind the original schedule for BigQuery integration I found myself between a rock and a hard place.

My client, who paid for the contribution made so far, does not want to spend more money and see the working integration in our product ASAP. On the other hand you guys want to keep as clean and maintainable codebase as possible. Due to personal reasons I cannot sacrifice my spare time in order to introduce TableRouter to this PR in a reasonable time. Also I have yet another PR to expose (datasets based on both - topic & record names).

I wonder if there is any chance that you could approve & merge this PR "as it is" and TableRouter would be introduced as a separate PR, in the near future?

mkubala avatar Feb 24 '20 15:02 mkubala

No, this PR adds too much complexity to the codebase, and isn't the right approach--the refactor is. I don't think it's much more work than this PR, itself. I am reaching out to @rhauch to see if you all can allocate time for refactor that's outlined in issue #245.

criccomini avatar Feb 24 '20 17:02 criccomini

I understand. @rhauch please let me know if you start working on that.

mkubala avatar Feb 25 '20 06:02 mkubala

@mkubala sorry, I won’t have time to work on this.

rhauch avatar Feb 25 '20 13:02 rhauch

There is light in the end of tunnel! I spoke to my client this morning and they will let me finish the feature as part of my daily job if I give them a kind of hard estimate / deadline.

How much time you will need to review this PR and release a new version of the plugin? Can you guarantee that after this PR got merged (and optionally the other one with support for matching datasets by both topic and record name) a new version of the connector plugin will be released?

mkubala avatar Feb 26 '20 14:02 mkubala

Hello, We are very interested in this feature. Can you tell us when you would consider adding @mkubala's contribution to your release?

mustosm avatar Apr 15 '20 12:04 mustosm

@mustosm Maybe I'll find some spare time in the next couple of weeks to finish this feature. Although I have no idea how much time it will take to re-review this PR and release a new version, so if you cannot wait then build connector from this branch on your own - that's what we did.

mkubala avatar Apr 15 '20 12:04 mkubala

@mkubala thank you for your answer. If the feature is released by wepay it would be better. Are you using this feature in production today?

mustosm avatar Apr 15 '20 12:04 mustosm

It's been used on the staging environment.

mkubala avatar Apr 15 '20 14:04 mkubala

Hi @mkubala 👋 Any update on this please? Are you using this in prod env? Did you make a release somewhere?

OuesFa avatar Nov 17 '21 16:11 OuesFa

👋 I'm sharing a solution for this using SMTs with the current version of the connector. https://github.com/confluentinc/kafka-connect-bigquery/issues/114#issuecomment-973118252

OuesFa avatar Nov 29 '21 13:11 OuesFa