Handle map and array types binary record data
Thanks for your contribution @umeshdangat , @loserwang1024 Would you like to help review this PR?
I was writing a custom flink cdc source connector. Input data has complex nested types. That is how I ran into the issue of BinaryRecordData not supporting complex types such as ARRAY and MAP, including AbstractBinaryWriter not supporting those. Most of the code here is copied from flink source (including relevant tests).
I was also trying to get my data eventually into paimon, using paimon sink. Thus I have made relevant changes to PaimonWriterHelper as well to handle complex types.
Lastly, I came across a bug due to circular dependency between DataFieldSerializer, DataTypeSerializer, and RowTypeSerializer classes that causes NPE for certain RowTypes like nested row type. I have baked that change in this PR but also have a separate one here with more explanation
@loserwang1024 thanks for looking into the PR.
I think that BinaryRecordData class is for the most part a port from flink's source BinaryRowData.
Following that code I tried to keep my code consistent with flink's BinaryArrayData and BinaryMapData.
@umeshdangat , thanks for you explain. More comment detail in java docs is always a good thing.
This pull request has been automatically marked as stale because it has not had recent activity for 60 days. It will be closed in 30 days if no further activity occurs.
This pull request has been closed because it has not had recent activity. You could reopen it if you try to continue your work, and anyone who are interested in it are encouraged to continue work on this pull request.
@loserwang1024 @leonardBang can we re-open this? I would still like to get this looked at and merged as I feel it is valuable to handle MapData and ArrayData in BinaryRecordData (as supported in flink source)
@umeshdangat feel free to reopen it. And I would like to review this PR if there are any updates.
@umeshdangat @herunkang2018 Reopened, feel free to continue work for this PR, and I'm looking forward to Complex data type as well.
Hi, @umeshdangat gentle ping. I think that this pr is an important base for many other following features, and we can try to merge it as soon as possible.
I guess that some classes refer to Flink like BinaryArrayData , so I think that we should avoid adding unnecessary change and add enough tests.
The reason to not reuse BinaryArrayData from flink source code is that flink-cdc appears to have its own interface defined for ArrayData which is then used in RecordData. RecordData in flink-cdc essentially a port of RowData interface from flink.
Also there are existing re-implementations of flink classes like GenericArrayData in flink-cdc. So there is prior art to this pattern as well.
@lvyanquan @loserwang1024 @herunkang2018 @leonardBang I believe I have addressed all outstanding comments on this PR. Could we continue moving the review forward?
@umeshdangat Thanks for your updating, and we will have a final review for this PR ASAP.
@loserwang1024 thanks for your review. Is this PR now good to merge?