hbase icon indicating copy to clipboard operation
hbase copied to clipboard

HBASE-28192 Master should recover if meta region state is inconsistent

Open virajjasani opened this issue 2 years ago • 12 comments

Jira: HBASE-28192

virajjasani avatar Nov 09 '23 20:11 virajjasani

:confetti_ball: +1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 0m 33s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+1 :green_heart: hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 :green_heart: mvninstall 4m 8s master passed
+1 :green_heart: compile 3m 5s master passed
+1 :green_heart: checkstyle 0m 56s master passed
+1 :green_heart: spotless 1m 4s branch has no errors when running spotless:check.
+1 :green_heart: spotbugs 2m 11s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 3m 28s the patch passed
+1 :green_heart: compile 2m 57s the patch passed
+1 :green_heart: javac 2m 57s the patch passed
+1 :green_heart: checkstyle 0m 48s the patch passed
+1 :green_heart: whitespace 0m 0s The patch has no whitespace issues.
+1 :green_heart: hadoopcheck 13m 37s Patch does not cause any errors with Hadoop 3.2.4 3.3.6.
+1 :green_heart: spotless 1m 3s patch has no errors when running spotless:check.
+1 :green_heart: spotbugs 2m 17s the patch passed
_ Other Tests _
+1 :green_heart: asflicense 0m 13s The patch does not generate ASF License warnings.
44m 46s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/5513
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 85bbcd0830eb 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7f3921ae40
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/1/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 09 '23 21:11 Apache-HBase

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 0m 27s Docker mode activated.
-0 :warning: yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 :green_heart: mvninstall 2m 36s master passed
+1 :green_heart: compile 0m 37s master passed
+1 :green_heart: shadedjars 5m 23s branch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 23s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 2m 19s the patch passed
+1 :green_heart: compile 0m 38s the patch passed
+1 :green_heart: javac 0m 38s the patch passed
+1 :green_heart: shadedjars 5m 22s patch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 21s the patch passed
_ Other Tests _
-1 :x: unit 256m 21s hbase-server in the patch failed.
278m 16s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/5513
Optional Tests javac javadoc unit shadedjars compile
uname Linux 8e84fa2d2746 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7f3921ae40
Default Java Temurin-1.8.0_352-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/1/testReport/
Max. process+thread count 4518 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 10 '23 00:11 Apache-HBase

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 0m 43s Docker mode activated.
-0 :warning: yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 :green_heart: mvninstall 3m 32s master passed
+1 :green_heart: compile 1m 2s master passed
+1 :green_heart: shadedjars 5m 48s branch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 31s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 3m 14s the patch passed
+1 :green_heart: compile 0m 49s the patch passed
+1 :green_heart: javac 0m 49s the patch passed
+1 :green_heart: shadedjars 4m 57s patch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 26s the patch passed
_ Other Tests _
-1 :x: unit 256m 42s hbase-server in the patch failed.
282m 3s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/5513
Optional Tests javac javadoc unit shadedjars compile
uname Linux 0ba8245296d2 5.4.0-163-generic #180-Ubuntu SMP Tue Sep 5 13:21:23 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7f3921ae40
Default Java Eclipse Adoptium-11.0.17+8
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/1/testReport/
Max. process+thread count 5086 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 10 '23 01:11 Apache-HBase

:confetti_ball: +1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+1 :green_heart: hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 :green_heart: mvninstall 2m 53s master passed
+1 :green_heart: compile 2m 26s master passed
+1 :green_heart: checkstyle 0m 36s master passed
+1 :green_heart: spotless 0m 43s branch has no errors when running spotless:check.
+1 :green_heart: spotbugs 1m 33s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 2m 35s the patch passed
+1 :green_heart: compile 2m 25s the patch passed
+1 :green_heart: javac 2m 25s the patch passed
+1 :green_heart: checkstyle 0m 35s the patch passed
+1 :green_heart: whitespace 0m 0s The patch has no whitespace issues.
+1 :green_heart: hadoopcheck 9m 27s Patch does not cause any errors with Hadoop 3.2.4 3.3.6.
+1 :green_heart: spotless 0m 41s patch has no errors when running spotless:check.
+1 :green_heart: spotbugs 1m 36s the patch passed
_ Other Tests _
+1 :green_heart: asflicense 0m 12s The patch does not generate ASF License warnings.
31m 53s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/5513
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux e069f5f58e6c 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7f3921ae40
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 79 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/2/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 10 '23 01:11 Apache-HBase

Unless we know what is root cause I'm always -1 for doing things like this in our normal code logic. HBCK is the correct way for fixing the incosistency which is caused by a code bug.

So why there is no SCP for the old server after it is already dead?

Added some comments on Jira, still it's suspicious, not a guaranteed root cause and maybe this can happen only during upgrade from 2.4 to 2.5? Let me check what happened to SCP of old server.

virajjasani avatar Nov 10 '23 03:11 virajjasani

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 0m 25s Docker mode activated.
-0 :warning: yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 :green_heart: mvninstall 3m 8s master passed
+1 :green_heart: compile 0m 46s master passed
+1 :green_heart: shadedjars 5m 29s branch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 25s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 2m 52s the patch passed
+1 :green_heart: compile 0m 46s the patch passed
+1 :green_heart: javac 0m 46s the patch passed
+1 :green_heart: shadedjars 5m 29s patch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 22s the patch passed
_ Other Tests _
-1 :x: unit 236m 15s hbase-server in the patch failed.
260m 3s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/5513
Optional Tests javac javadoc unit shadedjars compile
uname Linux 04f2cef806db 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7f3921ae40
Default Java Eclipse Adoptium-11.0.17+8
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/2/testReport/
Max. process+thread count 4748 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 10 '23 05:11 Apache-HBase

:broken_heart: -1 overall

Vote Subsystem Runtime Comment
+0 :ok: reexec 0m 12s Docker mode activated.
-0 :warning: yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 :green_heart: mvninstall 2m 32s master passed
+1 :green_heart: compile 0m 41s master passed
+1 :green_heart: shadedjars 4m 51s branch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 25s master passed
_ Patch Compile Tests _
+1 :green_heart: mvninstall 2m 20s the patch passed
+1 :green_heart: compile 0m 41s the patch passed
+1 :green_heart: javac 0m 41s the patch passed
+1 :green_heart: shadedjars 4m 52s patch has no errors when building our shaded downstream artifacts.
+1 :green_heart: javadoc 0m 24s the patch passed
_ Other Tests _
-1 :x: unit 244m 59s hbase-server in the patch failed.
266m 7s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/5513
Optional Tests javac javadoc unit shadedjars compile
uname Linux c9935834d86f 5.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7f3921ae40
Default Java Temurin-1.8.0_352-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/2/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/2/testReport/
Max. process+thread count 4661 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5513/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 10 '23 05:11 Apache-HBase

I would definitely prefer automated solutions rather than rely on HBCK. IMO anything requiring hbck is a bug.

bbeaudreault avatar Nov 12 '23 21:11 bbeaudreault

I would definitely prefer automated solutions rather than rely on HBCK. IMO anything requiring hbck is a bug.

I agree.

For this case, we now know that the root cause was (upgrade to 2.5 + downgrade to 2.4 + meta move + upgrade to 2.5) and hence master did not have correct server address in master local region.

I wonder if there is anything else that could also ever cause this problem.

virajjasani avatar Nov 20 '23 02:11 virajjasani

I would definitely prefer automated solutions rather than rely on HBCK. IMO anything requiring hbck is a bug.

I agree.

For this case, we now know that the root cause was (upgrade to 2.5 + downgrade to 2.4 + meta move + upgrade to 2.5) and hence master did not have correct server address in master local region.

I wonder if there is anything else that could also ever cause this problem.

Could you explain more on the root cause? Why this could cause this problem? Because the downgrading to 2.4 does not do all the necessary rollbacks?

Apache9 avatar Nov 20 '23 02:11 Apache9

Correct, downgrading to 2.4 does not remove meta's address from master local region's info:sn. Hence, any downgrade from 2.5 to older versions has this risk: it neither removes info CF from master local region, nor do they use master local region to update meta location (since HBASE-26193 is only applicable to 2.5.0+ releases).

virajjasani avatar Nov 20 '23 03:11 virajjasani

Correct, downgrading to 2.4 does not remove meta's address from master local region's info:sn. Hence, any downgrade from 2.5 to older versions has this risk: it neither removes info CF from master local region, nor do they use master local region to update meta location (since HBASE-26193 is only applicable to 2.5.0+ releases).

So I think we should add a tool in HBCK2 to for deleting data from master local region? Or at least, remove meta locations from master local region. So after downgrading to 2.4, we need a manual step to remove the location.

Apache9 avatar Nov 20 '23 03:11 Apache9