flink-cdc icon indicating copy to clipboard operation
flink-cdc copied to clipboard

Added option for MysqlSource to dispatch watermark event at the begin…

Open qidian99 opened this issue 4 years ago • 3 comments

When we enter binlog phase in a cdc job, the SplitAssigner only assigns a binlog split to one of the SourceReaders, namely the first reader waiting for splits in the queue. Therefore, there are use cases where we want to automatically scale in or out based on whether the sync job has entered binlog phase. A possible scenario would be as follows:

  1. The user start a cdc job to sync two tables in the target database with a parallelism of 16
  2. All snapshot splits has been emitted and a complete checkpoint has succeeded so that all records are consumed in the sink
  3. A savepoint is triggered and the cdc job is restarted with a parallelism of 1.
  4. The user wants to add new tables to the cdc job as in this PR. A savepoint is triggered again and the job is restarted with 100 tables added and a parallelism of 128 for parallel reading splits
  5. All the new snapshot splits has been emitted and a complete checkpoint has succeeded.
  6. A savepoint is triggered again and the job is restarted with a parallelism of 1 to consume binlog records

qidian99 avatar Feb 23 '22 07:02 qidian99

Hi @qidian99, sorry for the delay of this PR. Could you please rebase it to latest master branch since there’s been lots of changes in Flink CDC repo since your original commit? Kindly reminder that anything under com.ververica.cdc.connectors.mysql package has been moved to org.apache.flink.cdc.connectors.mysql.

Thanks for your contribution!

yuxiqian avatar Apr 25 '24 06:04 yuxiqian

This pull request has been automatically marked as stale because it has not had recent activity for 60 days. It will be closed in 30 days if no further activity occurs.

github-actions[bot] avatar Jul 17 '24 00:07 github-actions[bot]