paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[cdc] paimon cdc database sync support computed columns

Open MOBIN-F opened this issue 1 year ago • 1 comments

Purpose

Currently, there are still many paimon users who need to support computed columns when synchronizing multiple tables in the entire database. This PR will implement this feature.

If the referenced columned column exists in the table, the columned column function will be triggered. For example,

Table A column:
col1 col2 col3 create_sys_tm

Table B column:
col1 col4 col4 create_sys_tm

Define the calculated function:
--computed_column="_substring=substring(col3,2), pt=date_format(create_sys_tm,yyyyMMdd)"

The final table structure is as follows:
Table A column:
col1 col2 col3 _substring pt

Table B column:
col1 col4 col4 pt

The main changes of PR: Move the implementation of xxxRecordParser#evalComputedColumns down to RichCdcMultiplexRecordEventParser#evalComputedColumns

Tests

MySqlSyncDatabaseTableListITCase#testComputedColumn

MongoDBSyncDatabaseActionITCase#testComputedColumn

KafkaOggSyncDatabaseActionITCase#testComputedColumn

KafkaMaxwellSyncDatabaseActionITCase#testComputedColumn

KafkaCanalSyncDatabaseActionITCase#testComputedColumn

API and Format

Documentation

MOBIN-F avatar Sep 14 '24 10:09 MOBIN-F

@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~

MOBIN-F avatar Sep 23 '24 01:09 MOBIN-F

@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~

i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? @MOBIN-F

dyp12 avatar Jul 10 '25 09:07 dyp12

@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~

i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? @MOBIN-F

hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.

MOBIN-F avatar Jul 10 '25 09:07 MOBIN-F

@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~

i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? @MOBIN-F

hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.

i need kafka->paimon,but the pipeline of flink-cdc not support,kafka only support sink,do you use this function is production environment ? @MOBIN-F

dyp12 avatar Jul 10 '25 09:07 dyp12

@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~你能帮忙审核一下这个公关吗?谢谢~

i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? 我有个问题,如果 tableA 和 tableB 有相同的字段,使用 table.col 而不是 col 可以减少计算,为什么要使用 col,原因是 DataField 只有名称,没有 tableName?@MOBIN-F

hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.你好,我推荐你使用 flink-cdc 的管道 ,它提供相同的功能并且更加灵活。

i need kafka->paimon,but the pipeline of flink-cdc not support,kafka only support sink,do you use this function is production environment ? 我需要 kafka->paimon,但是 flink-cdc 的 pipeline 不支持,kafka 只支持 sink,请问您在生产环境中使用这个功能吗?@MOBIN-F

@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~

i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? @MOBIN-F

hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.

i need kafka->paimon,but the pipeline of flink-cdc not support,kafka only support sink,do you use this function is production environment ? @MOBIN-F

Yes, this PR has been in our production application for a long time, mainly used to synchronize Kafka's canal-json->paimon

There is no difference between using table.col or col, the calculated column is effective for every record

MOBIN-F avatar Jul 10 '25 09:07 MOBIN-F

@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~你能帮忙审核一下这个公关吗?谢谢~

i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? 我有个问题,如果 tableA 和 tableB 有相同的字段,使用 table.col 而不是 col 可以减少计算,为什么要使用 col,原因是 DataField 只有名称,没有 tableName?@MOBIN-F

hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.你好,我推荐你使用 flink-cdc 的管道 ,它提供相同的功能并且更加灵活。

i need kafka->paimon,but the pipeline of flink-cdc not support,kafka only support sink,do you use this function is production environment ? 我需要 kafka->paimon,但是 flink-cdc 的 pipeline 不支持,kafka 只支持 sink,请问您在生产环境中使用这个功能吗?@MOBIN-F

@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~

i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? @MOBIN-F

hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.

i need kafka->paimon,but the pipeline of flink-cdc not support,kafka only support sink,do you use this function is production environment ? @MOBIN-F

Yes, this PR has been in our production application for a long time, mainly used to synchronize Kafka's canal-json->paimon

There is no difference between using table.col or col, the calculated column is effective for every record

thanks for your help,have a nice day

dyp12 avatar Jul 10 '25 09:07 dyp12