flink-cdc
flink-cdc copied to clipboard
[pipeline-connector][paimon] add paimon pipeline data sink connector.
This close https://github.com/ververica/flink-cdc-connectors/issues/2856. Some codes are inspired by FlinkCdcMultiTableSink in Paimon repo, and add a sinkV2 implement.
@PatrickRen PTAL.
paimon在schemaschemachangevent事件产生时会从catalog里加载最新的schema,这个时候schema有可能没被修改,导致写入的数据还是修改ddl之前的schema字段数据,新的字段数据读不出来或者删除字段后出现新的问题
是否可以在releasestream之后发送schemaChangeEvebt事件,这样下游获取schema就一定是最新的
Thanks @yanghuaiGit for pointing out this, address it.
com.ververica.cdc.connectors.paimon.sink.PaimonMetadataApplier 静态字段catalog,在反序列化之后,获取的对象里catalog为null,导致com.ververica.cdc.connectors.paimon.sink.PaimonMetadataApplier#applySchemaChange方法执行时为空指针。
catalog应改为
private transient Catalog catalog;,在applySchemaChange时判断是否为空来构建一个catalog
address it.
Support reading data from multiple table messages written to the same topic?
https://github.com/apache/flink-cdc/pull/2938#issuecomment-1970940065
Can multiple table cdc message be written to the same topic?
You can do this by using route
in pipeline.
paimon latest version is 0.7,we should update paimon version from 0.6 to 0.7
paimon latest version is 0.7,we should update paimon version from 0.6 to 0.7
@lvyanquan Could you take a look at this one? I prefer to catch up with the latest version as well. Also could you rebase the latest master? Thanks
paimon latest version is 0.7,we should update paimon version from 0.6 to 0.7
Done and rebased to master.
Does kafka header set constant values? For example, if data from multiple data centers is written to the same kafka topic, add a region key to the kafka header.
Thanks @yuxiqian for those comments, I've addressed it and resubmitted.