flink-cdc icon indicating copy to clipboard operation
flink-cdc copied to clipboard

Handle map and array types binary record data

Open umeshdangat opened this issue 1 year ago • 5 comments

umeshdangat avatar Jun 24 '24 16:06 umeshdangat

Thanks for your contribution @umeshdangat , @loserwang1024 Would you like to help review this PR?

leonardBang avatar Jun 26 '24 12:06 leonardBang

I was writing a custom flink cdc source connector. Input data has complex nested types. That is how I ran into the issue of BinaryRecordData not supporting complex types such as ARRAY and MAP, including AbstractBinaryWriter not supporting those. Most of the code here is copied from flink source (including relevant tests).

I was also trying to get my data eventually into paimon, using paimon sink. Thus I have made relevant changes to PaimonWriterHelper as well to handle complex types.

Lastly, I came across a bug due to circular dependency between DataFieldSerializer, DataTypeSerializer, and RowTypeSerializer classes that causes NPE for certain RowTypes like nested row type. I have baked that change in this PR but also have a separate one here with more explanation

umeshdangat avatar Jun 26 '24 16:06 umeshdangat

@loserwang1024 thanks for looking into the PR.

I think that BinaryRecordData class is for the most part a port from flink's source BinaryRowData.

Following that code I tried to keep my code consistent with flink's BinaryArrayData and BinaryMapData.

umeshdangat avatar Jul 05 '24 18:07 umeshdangat

@umeshdangat , thanks for you explain. More comment detail in java docs is always a good thing.

loserwang1024 avatar Jul 15 '24 11:07 loserwang1024

This pull request has been automatically marked as stale because it has not had recent activity for 60 days. It will be closed in 30 days if no further activity occurs.

github-actions[bot] avatar Oct 07 '24 00:10 github-actions[bot]

This pull request has been closed because it has not had recent activity. You could reopen it if you try to continue your work, and anyone who are interested in it are encouraged to continue work on this pull request.

github-actions[bot] avatar Nov 07 '24 00:11 github-actions[bot]

@loserwang1024 @leonardBang can we re-open this? I would still like to get this looked at and merged as I feel it is valuable to handle MapData and ArrayData in BinaryRecordData (as supported in flink source)

umeshdangat avatar Nov 15 '24 00:11 umeshdangat

@umeshdangat feel free to reopen it. And I would like to review this PR if there are any updates.

herunkang2018 avatar Nov 15 '24 11:11 herunkang2018

@umeshdangat @herunkang2018 Reopened, feel free to continue work for this PR, and I'm looking forward to Complex data type as well.

leonardBang avatar Nov 18 '24 06:11 leonardBang

Hi, @umeshdangat gentle ping. I think that this pr is an important base for many other following features, and we can try to merge it as soon as possible.

lvyanquan avatar Nov 27 '24 08:11 lvyanquan

I guess that some classes refer to Flink like BinaryArrayData , so I think that we should avoid adding unnecessary change and add enough tests.

The reason to not reuse BinaryArrayData from flink source code is that flink-cdc appears to have its own interface defined for ArrayData which is then used in RecordData. RecordData in flink-cdc essentially a port of RowData interface from flink.

Also there are existing re-implementations of flink classes like GenericArrayData in flink-cdc. So there is prior art to this pattern as well.

umeshdangat avatar Dec 05 '24 00:12 umeshdangat

@lvyanquan @loserwang1024 @herunkang2018 @leonardBang I believe I have addressed all outstanding comments on this PR. Could we continue moving the review forward?

umeshdangat avatar Dec 07 '24 00:12 umeshdangat

@umeshdangat Thanks for your updating, and we will have a final review for this PR ASAP.

leonardBang avatar Dec 09 '24 03:12 leonardBang

@loserwang1024 thanks for your review. Is this PR now good to merge?

umeshdangat avatar Dec 25 '24 06:12 umeshdangat