[Feature] Kafka CDC with debezium/ canal json support lacking of before in update
Search before asking
- [X] I searched in the issues and found nothing similar.
Motivation
@yuzelin
由于我们采用整库同步,单个 topic 里存在多个表,CDC采集侧也不受我们控制,受限于整体的数据量大,其中仅有极少数CDC的update数据有before前项。
我们惊奇发现 Apache Paimon changelog producer,因此想要同步至 Paimon ODS 中采用 lookup 方式来产生正确的 CDC给后续链路提供支持。但在测试过程中发现若 Kafka CDC 对于 update 数据中缺少 before 的数据会报错。
因此希望Kafka CDC 整库同步可以适配 缺少前项的 update,或者提供一种解决方案,我们将收到的 Kafka CDC 数据进行调整,再打到 自己的 Kafka 队列中进行消费。
English translation
Due to our adoption of whole-database synchronization, there are multiple tables within a single topic. We do not have control over the CDC collection side, and due to the large overall data volume, only a very small number of CDC update data contain the before field.
We were surprised to discover the Apache Paimon changelog producer, so we want to use lookup in synchronizing with Paimon ODS to generate correct CDCs for subsequent links. However, during testing, we found that if Kafka CDC lacks before data in update records, an error will occur.
Therefore, we hope that Kafka CDC's whole-database synchronization can adapt to updates lacking before fields or provide a solution where we adjust the received Kafka CDC data before sending it to our own Kafka queue for consumption.
Solution
No response
Anything else?
No response
Are you willing to submit a PR?
- [ ] I'm willing to submit a PR!