paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[common][flink] Add support for complex types in kafka debezium avro cdc action

Open AshishKhatkar opened this issue 1 year ago • 2 comments

Purpose

@umeshdangat has an outstanding PR which was updated after PR by @zhuangchong was merged. That PR: https://github.com/apache/paimon/pull/3323

The above patch allows consuming data from avro data from kafka into paimon but it doesnt support complex avro types. This PR achieves that. The original PR https://github.com/zhuangchong/flink-table-store/pull/1 was pointing to @zhuangchong branch to clearly show the changes only relavant to supporting complex avro types.

This PR fixes the failing E2E tests for PR: https://github.com/apache/paimon/pull/3931

Copying the note from PR: https://github.com/apache/paimon/pull/3931

One issue is CdcSourceRecord contains Map<String, String> thus the current somewhat tedious approach is to deserialize avro complex types into json strings and then read them back from json strings rather than changing CdcSourceRecord Map<String, Object> to support value as Object. It would be a much larger change looking at the code changes needed.

I had to update the DataField to add a method for dataFieldEqualsIgnoreId which already existing in RichEventParser. For nested RowType fields this becomes necessary (coming from nested avro records) as when a DataField.type= RowType we cannot simply do equals on all data fields as they contain Id as well and it fails the equality, although there is no schema change.

@JingsongLi @zhuangchong let us know what you think.

cc: @umeshdangat

Linked issue: close xxx

Tests

API and Format

Documentation

AshishKhatkar avatar Sep 24 '24 09:09 AshishKhatkar

CC @yuzelin to take a reivew~

JingsongLi avatar Sep 26 '24 05:09 JingsongLi

Thanks @AshishKhatkar

JingsongLi avatar Sep 26 '24 05:09 JingsongLi