paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Bug] Incorrectly including tables matching excludingTablePattern in combined mode cdc

Open chjnxp opened this issue 1 year ago • 0 comments

Search before asking

  • [X] I searched in the issues and found nothing similar.

Paimon version

master

Compute Engine

flink

Minimal reproduce step

  1. Create a database named cdc_test in mysql, then create primary table named pk_1, pk_2 ... pk_100 and non-primary table named non_pk_1, non_pk_2, ... non_pk_100.
  2. Start a combined mode mysql database cdc job, and set excludingTablePattern to 'non_pk_.+'.
  3. Concurently create non-primary table named non_pk_101 when starting cdc job.
  4. Finally, the Jobmanager log will show "com.ververica.cdc.connectors.mysql.source.utils.TableDiscoveryUtils [] - including ‘cdc_test.non_pk_101’ for further processing".

What doesn't meet your expectations?

non_pk_101 obviously matches the excludingTablePattern and needs to be excluded. Since non_pk_101 is a non-primary key table, MySqlChunkSplitter will report an error: Caused by: org.apache.flink.table.api.ValidationException: Incremental snapshot for tables requires primary key, but table cdc_test.non_pk_101 doesn't have primary key.

image combinedModeTableList func uses excluding pattern ?!(^db\.tbl$)|(^...$),and this will miss the newly table that matches excludingTablePattern and created when starting cdc job.

Anything else?

No response

Are you willing to submit a PR?

  • [X] I'm willing to submit a PR!

chjnxp avatar Sep 10 '24 10:09 chjnxp