Hbase-28014: Transient underlying HDFS failure causes permanent replciation failure between 2 HBase Clusters

Open tonyPan123 opened this issue 2 years ago • 1 comments

See HBase-28014 for details on the symptom and diagnostic.

The fix of HBase-28014 is implemented. In my local machine, it is able to pass all test cases.

I add some configurable retry mechanism to tolerate possible underlying HDFS failure.

Actually, I could add a unit test right now. However, I need to add a Fault Injector class to inject the IOException or override the replicationSourceManager and dynamically include the overriding one in the unit test. I am wondering if that would be a good convention.

Any comments and suggestions would be appreciated.

Aug 17 '23 20:08 tonyPan123

Thanks for opening a PR. In HBase, we usually first open a PR against the master branch, and then cherry-pick to other branches.

So please open a PR against master branch? Branch-2.0 has been EOL for a long time, I do not think the problem will only affect branch-2.0?

Sep 09 '23 15:09 Apache9