flink-cdc icon indicating copy to clipboard operation
flink-cdc copied to clipboard

Support dynamically table addition in flink-cdc-base

Open PatrickRen opened this issue 3 years ago • 1 comments

Currently flink-cdc-base framework doesn't support discovering and adding tables dynamically. This feature is already implemented in MySQL CDC connector so it's required to support it in the framework in order to adapt MySQL CDC connector to flink-cdc-base.

PatrickRen avatar May 06 '22 11:05 PatrickRen

please assign this task to me if no one accepts this task. I am happy to accept this assignment @PatrickRen

molsionmo avatar Jul 13 '22 07:07 molsionmo

Hey @PatrickRen @leonardBang, do you know if anyone is actively working on this?

I believe this is needed for https://github.com/ververica/flink-cdc-connectors/issues/1163 (we'd like to ship incremental support in Postgres with scan.newly-added-table.enabled feature). I'd love to be assigned, I've already started working on this.

sap1ens avatar Dec 06 '22 00:12 sap1ens

Hello, @molsionmo Are you still working for this ?

leonardBang avatar Dec 06 '22 07:12 leonardBang

Could you reassign this to me? I created a PR for this: https://github.com/ververica/flink-cdc-connectors/pull/1838

sap1ens avatar Dec 21 '22 20:12 sap1ens

@leonardBang I'm sorry for not replying in time. I developed part of the work before and didn't have time to submit PR. I submitted that PR separately and compared the work content of sap1ens with many similar parts. Thank you @sap1ens for your excellent work.

My PR section just includes the Support dynamically table addition in Flinks-CDC-base. If sap1ens PR is adopted, I will close my pr and participate in the review and test work.

molsionmo avatar Jan 17 '23 11:01 molsionmo

Can anyone look at my PR again?

sap1ens avatar Jun 16 '23 04:06 sap1ens

Can anyone look at my PR again?

Hey, @sap1ens Jiabao is helping to review the PR, but recently we're busy on 2.4 version code freeze, so may be the review work would be continued later. And the PR is a huge enhancement and I'd like to put it to next version as it's close to code freeze date. WDY?

leonardBang avatar Jun 16 '23 04:06 leonardBang

Sure, just wanted to remind before the 2.4 release, but it looks like it's too late :) No worries.

sap1ens avatar Jun 16 '23 04:06 sap1ens

Hope to be able to merge to version 2.4

1032851561 avatar Jun 19 '23 09:06 1032851561

Hi, just wanted to remind you about the PR again, thanks!

sap1ens avatar Aug 29 '23 20:08 sap1ens

Could this be considered for 3.1? I can look into rebasing the PR if needed, assuming it'll get the attention.

sap1ens avatar Dec 14 '23 18:12 sap1ens

@sap1ens I added this to 3.1 roadmap. @molsionmo Do you time to finish this in 3.1 version? we can find someone to finish this task if you are busy in your company business.

leonardBang avatar Dec 15 '23 03:12 leonardBang

I'll take a look at the PR tomorrow and let you know. Thanks!

sap1ens avatar Dec 15 '23 05:12 sap1ens

@leonardBang I've updated the PR: https://github.com/ververica/flink-cdc-connectors/pull/1838. However, a lot of things have changed since December 2022 🙂. I found several PRs with changes for this feature, including a very large one.

What's your guidance here?

Should we copy the latest implementation of the Scan Newly Added Tables feature? It'll probably take me several days to accommodate new changes + there is more testing needed. But it may make sense to do it if you think that the existing implementation in MySQL is significantly better (the non-blocking reads are great).

On the other hand, if the current PR is good enough I can quickly add support for Postgres after that and it's already well-tested in prod (we've been running it in prod for about a year).

sap1ens avatar Dec 19 '23 00:12 sap1ens

Hey, @sap1ens thanks for your updating. I think we should copy the latest implementation which is better than before, and we can wait this feature in 3.1 release, we have enough time to finish this in 3.1 version development circle. WDYT?

leonardBang avatar Dec 19 '23 01:12 leonardBang

@sap1ens, it seems that unblocking the process for newly added tables is a better approach, and I am also interested in PG CDC and have enough time recently. I would like to collaborate with you, for instance, I can help implement certain functionalities or review and provide feedback on your Pull Requests. By the way, my PR Add SNAPSHOT_ONLY mode for Incremental CDC Source may have influnce on it(because both will stop the stream split for difference purpose), so I will complete it this week without blocking this PR.

loserwang1024 avatar Dec 19 '23 02:12 loserwang1024

@leonardBang I've attempted to apply new updates from the PRs I identified, but, unfortunately, it's just too much work at the moment for me, I only have a few working days left in the year. I'm also not sure that this list of PRs is complete. Likely it's not and copying changes requires comparing all relevant files one-by-one.

But I do believe it's an important change and waiting longer will increase the difference between the connectors even more. So I'd appreciate any help here, FYI @loserwang1024.

Once the cdc-base is updated, I'm happy to contribute Postgres-specific changes and tests.

sap1ens avatar Dec 20 '23 00:12 sap1ens

@leonardBang , I'd like to do it. @sap1ens, thanks a lot , being able to reference your past work will help me avoid a lot of trouble.

loserwang1024 avatar Dec 20 '23 03:12 loserwang1024

Closing this issue because it was created before version 2.3.0 (2022-11-10). Please try the latest version of Flink CDC to see if the issue has been resolved. If the issue is still valid, kindly report it on Apache Jira under project Flink with component tag Flink CDC. Thank you!

PatrickRen avatar Feb 28 '24 15:02 PatrickRen

Actually, this was implemented in https://github.com/ververica/flink-cdc-connectors/pull/3024

sap1ens avatar Feb 28 '24 17:02 sap1ens