flink
flink copied to clipboard
[FLINK-28910][Connectors/hbase]Fix potential data deletion while updating HBase rows
What is the purpose of the change
https://issues.apache.org/jira/browse/FLINK-28910
Brief change log
- *Add reduce when hbase connector process mutation.
Verifying this change
CI passed
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
@Public(Evolving)
: no - The serializers: no
- The runtime per-record code paths (performance sensitive): no
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
- The S3 file system connector: no
Documentation
- Does this pull request introduce a new feature? no
CI report:
- 3b56735cc843cd6629de56534f1fd6b83a30a854 Azure: FAILURE
Bot commands
The @flinkbot bot supports the following commands:-
@flinkbot run azure
re-run the last Azure build
@flinkbot run azure
@flinkbot run azure
hi @luoyuxia, could you please help me review the changes?Thank you.
@wuchong @dannycranmer could you please help me review the changes?Thank you.😄
@MartijnVisser could you please help me review the changes?Thank you.😄
For clarity, the title could be changed to something like 'Fix potential data deletion while updating HBase rows'. Just my suggestion : )
Thank you for your suggestion, I think it is really much more clear.
@MartijnVisser could you please help me review the changes?Thank you.😄
@ganlute I have no experience with HBase, so I can't review it unfortunately. I think the Flink community is lacking on HBase maintainers in general to be honest.
@leonardBang Do you think you could have a look? Since you have experience with CDC, I thought you might could help out here :)
The failure of CI seems to have nothing to do with pr.
So is this superseded by https://github.com/apache/flink/pull/22612 or not?
So is this superseded by #22612 or not?
Yes, the 2 issues have the same root cause, that an insert and a delete operation are passed to HBase with the same millisecond precision TS and in that case, the order of the HBase execution is not guaranteed. The changes made in #22612 explicitly sets nanosecond precision timestamps for the HBase operations, so it eliminates the possibility to have multiple operations "at the same time", so deletes and inserts will be executed in correct order.