hbase
hbase copied to clipboard
HBASE-28158 Decouple RIT list management from TRSP invocation
Only remove a region from the RIT map when assignment reaches a suitable terminal state.
Before this change we would map a RIT list entry to a running TRSP. This is problematic because the in-transition state does not always correspond to the lifetime of a TRSP. In cases of an operator bypass or bug or timeout the region will still be in transition from the user or operator perspective but not included in the RIT list.
Operators expect that regions that should be open but are not appear the master's RIT list, provided by /rits.jsp, the output of the shell's 'rit' command, and in ClusterStatus.
:confetti_ball: +1 overall
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
+0 :ok: | reexec | 0m 35s | Docker mode activated. |
_ Prechecks _ | |||
+1 :green_heart: | dupname | 0m 0s | No case conflicting files found. |
+1 :green_heart: | hbaseanti | 0m 0s | Patch does not have any anti-patterns. |
+1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. |
_ master Compile Tests _ | |||
+1 :green_heart: | mvninstall | 4m 17s | master passed |
+1 :green_heart: | compile | 3m 6s | master passed |
+1 :green_heart: | checkstyle | 0m 42s | master passed |
+1 :green_heart: | spotless | 0m 51s | branch has no errors when running spotless:check. |
+1 :green_heart: | spotbugs | 1m 46s | master passed |
_ Patch Compile Tests _ | |||
+1 :green_heart: | mvninstall | 3m 25s | the patch passed |
+1 :green_heart: | compile | 3m 13s | the patch passed |
+1 :green_heart: | javac | 3m 13s | the patch passed |
+1 :green_heart: | checkstyle | 0m 42s | the patch passed |
+1 :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 :green_heart: | hadoopcheck | 12m 30s | Patch does not cause any errors with Hadoop 3.2.4 3.3.6. |
+1 :green_heart: | spotless | 1m 8s | patch has no errors when running spotless:check. |
+1 :green_heart: | spotbugs | 2m 29s | the patch passed |
_ Other Tests _ | |||
+1 :green_heart: | asflicense | 0m 13s | The patch does not generate ASF License warnings. |
43m 56s |
Subsystem | Report/Notes |
---|---|
Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/artifact/yetus-general-check/output/Dockerfile |
GITHUB PR | https://github.com/apache/hbase/pull/5464 |
Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile |
uname | Linux 625f8eec7b40 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | dev-support/hbase-personality.sh |
git revision | master / 391dfda6ad |
Default Java | Eclipse Adoptium-11.0.17+8 |
Max. process+thread count | 78 (vs. ulimit of 30000) |
modules | C: hbase-server U: hbase-server |
Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/console |
versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 |
Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
So the intention here is that, when region is in abnormally closed state, we will use operator tools to bypass the TRSP, but then, we will consider the region as not in RIT and confusing the operators?
Seems reasonable. But we should be careful to not mess up the normal logic. Let me take a look at the related code carefully.
Thanks.
Based on my local tests there are three test failures to resolve.
This first one is a false positive:
- TestAssignmentManagerUtil.testCreateUnassignProceduresForMergeFail
These two are timeouts, so waiters need changes:
- TestAssignmentManagerMetrics.testRITAssignmentManagerMetrics
- TestRegionReplicaSplit.testAssignFakeReplicaRegion
I think these are test issues but will look in detail tomorrow. Let's see if precommit here has more.
:broken_heart: -1 overall
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
+0 :ok: | reexec | 0m 30s | Docker mode activated. |
-0 :warning: | yetus | 0m 3s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck |
_ Prechecks _ | |||
_ master Compile Tests _ | |||
+1 :green_heart: | mvninstall | 2m 32s | master passed |
+1 :green_heart: | compile | 0m 36s | master passed |
+1 :green_heart: | shadedjars | 5m 6s | branch has no errors when building our shaded downstream artifacts. |
+1 :green_heart: | javadoc | 0m 24s | master passed |
_ Patch Compile Tests _ | |||
+1 :green_heart: | mvninstall | 2m 11s | the patch passed |
+1 :green_heart: | compile | 0m 36s | the patch passed |
+1 :green_heart: | javac | 0m 36s | the patch passed |
+1 :green_heart: | shadedjars | 5m 6s | patch has no errors when building our shaded downstream artifacts. |
+1 :green_heart: | javadoc | 0m 22s | the patch passed |
_ Other Tests _ | |||
-1 :x: | unit | 226m 57s | hbase-server in the patch failed. |
248m 21s |
Subsystem | Report/Notes |
---|---|
Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
GITHUB PR | https://github.com/apache/hbase/pull/5464 |
Optional Tests | javac javadoc unit shadedjars compile |
uname | Linux 6bb6c17b8131 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | dev-support/hbase-personality.sh |
git revision | master / 391dfda6ad |
Default Java | Temurin-1.8.0_352-b08 |
unit | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt |
Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/testReport/ |
Max. process+thread count | 4636 (vs. ulimit of 30000) |
modules | C: hbase-server U: hbase-server |
Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/console |
versions | git=2.34.1 maven=3.8.6 |
Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
:broken_heart: -1 overall
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
+0 :ok: | reexec | 0m 49s | Docker mode activated. |
-0 :warning: | yetus | 0m 3s | Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck |
_ Prechecks _ | |||
_ master Compile Tests _ | |||
+1 :green_heart: | mvninstall | 4m 3s | master passed |
+1 :green_heart: | compile | 1m 4s | master passed |
+1 :green_heart: | shadedjars | 6m 0s | branch has no errors when building our shaded downstream artifacts. |
+1 :green_heart: | javadoc | 0m 32s | master passed |
_ Patch Compile Tests _ | |||
+1 :green_heart: | mvninstall | 3m 27s | the patch passed |
+1 :green_heart: | compile | 1m 9s | the patch passed |
+1 :green_heart: | javac | 1m 9s | the patch passed |
+1 :green_heart: | shadedjars | 6m 21s | patch has no errors when building our shaded downstream artifacts. |
+1 :green_heart: | javadoc | 0m 35s | the patch passed |
_ Other Tests _ | |||
-1 :x: | unit | 240m 1s | hbase-server in the patch failed. |
268m 27s |
Subsystem | Report/Notes |
---|---|
Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
GITHUB PR | https://github.com/apache/hbase/pull/5464 |
Optional Tests | javac javadoc unit shadedjars compile |
uname | Linux 5dc037f38998 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | dev-support/hbase-personality.sh |
git revision | master / 391dfda6ad |
Default Java | Eclipse Adoptium-11.0.17+8 |
unit | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt |
Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/testReport/ |
Max. process+thread count | 4250 (vs. ulimit of 30000) |
modules | C: hbase-server U: hbase-server |
Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5464/1/console |
versions | git=2.34.1 maven=3.8.6 |
Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
Yes I have legit test failures to resolve, doing that soon.
For me I think the fundamental issue is the time a region is in transition is not always tied to the lifetime of a TransitRegionStateProcedure
, so managing the RIT list from addProcedure
/removeProcedure
is not correct.
A region is in transition if it is not assigned. In our production we have had regions in transition and no TransitRegionStateProcedure
either alive or scheduled for the region. That definitely happens if the TRSP is bypassed. It can also happen due to code bug. If the region is not assigned, it is still in transition, and should be reported as such by the master.
I can agree to totally decouple RIT list management from addProcedure
/removeProcedure
. That makes sense. The assignment manager will make separate calls to RegionStates
to manage the RIT map, independently from management of procedures.
How does that sound?
For me I think the fundamental issue is the time a region is in transition is not always tied to the lifetime of a TransitRegionStateProcedure, so managing the RIT list from addProcedure/removeProcedure is not correct.
But if there is no issue, I do not think a region can be in the RIT state when there is no TRSP? I.e, only if you use HBCK to bypass a TRSP, but I do not think this is a normal operation. That why I say we'd better separate the normal logic and abnormal logic. And if a TRSP finishes, I think the region should leave the RIT state.