[cdc] paimon cdc database sync support computed columns
Purpose
Currently, there are still many paimon users who need to support computed columns when synchronizing multiple tables in the entire database. This PR will implement this feature.
If the referenced columned column exists in the table, the columned column function will be triggered. For example,
Table A column:
col1 col2 col3 create_sys_tm
Table B column:
col1 col4 col4 create_sys_tm
Define the calculated function:
--computed_column="_substring=substring(col3,2), pt=date_format(create_sys_tm,yyyyMMdd)"
The final table structure is as follows:
Table A column:
col1 col2 col3 _substring pt
Table B column:
col1 col4 col4 pt
The main changes of PR: Move the implementation of xxxRecordParser#evalComputedColumns down to RichCdcMultiplexRecordEventParser#evalComputedColumns
Tests
MySqlSyncDatabaseTableListITCase#testComputedColumn
MongoDBSyncDatabaseActionITCase#testComputedColumn
KafkaOggSyncDatabaseActionITCase#testComputedColumn
KafkaMaxwellSyncDatabaseActionITCase#testComputedColumn
KafkaCanalSyncDatabaseActionITCase#testComputedColumn
API and Format
Documentation
@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~
@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~
i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? @MOBIN-F
@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~
i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? @MOBIN-F
hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.
@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~
i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? @MOBIN-F
hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.
i need kafka->paimon,but the pipeline of flink-cdc not support,kafka only support sink,do you use this function is production environment ? @MOBIN-F
@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~你能帮忙审核一下这个公关吗?谢谢~
i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? 我有个问题,如果 tableA 和 tableB 有相同的字段,使用 table.col 而不是 col 可以减少计算,为什么要使用 col,原因是 DataField 只有名称,没有 tableName?@MOBIN-F
hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.你好,我推荐你使用 flink-cdc 的管道 ,它提供相同的功能并且更加灵活。
i need kafka->paimon,but the pipeline of flink-cdc not support,kafka only support sink,do you use this function is production environment ? 我需要 kafka->paimon,但是 flink-cdc 的 pipeline 不支持,kafka 只支持 sink,请问您在生产环境中使用这个功能吗?@MOBIN-F
@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~
i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? @MOBIN-F
hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.
i need kafka->paimon,but the pipeline of flink-cdc not support,kafka only support sink,do you use this function is production environment ? @MOBIN-F
Yes, this PR has been in our production application for a long time, mainly used to synchronize Kafka's canal-json->paimon
There is no difference between using table.col or col, the calculated column is effective for every record
@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~你能帮忙审核一下这个公关吗?谢谢~
i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? 我有个问题,如果 tableA 和 tableB 有相同的字段,使用 table.col 而不是 col 可以减少计算,为什么要使用 col,原因是 DataField 只有名称,没有 tableName?@MOBIN-F
hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.你好,我推荐你使用 flink-cdc 的管道 ,它提供相同的功能并且更加灵活。
i need kafka->paimon,but the pipeline of flink-cdc not support,kafka only support sink,do you use this function is production environment ? 我需要 kafka->paimon,但是 flink-cdc 的 pipeline 不支持,kafka 只支持 sink,请问您在生产环境中使用这个功能吗?@MOBIN-F
@JingsongLi @yuzelin @zhuangchong can you help review this pr? tks~
i have a question, if tableA and tableB have same field , use table.col not col can reduce calculate,why use col, the reason is DataField only have name ,not have tableName ? @MOBIN-F
hi,@dyp12 I recommend you to use the pipeline of flink-cdc, which provides the same capabilities and is more flexible.
i need kafka->paimon,but the pipeline of flink-cdc not support,kafka only support sink,do you use this function is production environment ? @MOBIN-F
Yes, this PR has been in our production application for a long time, mainly used to synchronize Kafka's canal-json->paimon
There is no difference between using table.col or col, the calculated column is effective for every record
thanks for your help,have a nice day