hbase HBASE-27698 Migrate meta locations from zookeeper to master data may …

…not always possible if we migrate from 1.x HBase

Apr 09 '23 18:04 chrajeshbabu

:confetti_ball: +1 overall

Vote	Subsystem	Runtime	Comment
+0 :ok:	reexec	1m 8s	Docker mode activated.
		_ Prechecks _
+1 :green_heart:	dupname	0m 0s	No case conflicting files found.
+1 :green_heart:	hbaseanti	0m 0s	Patch does not have any anti-patterns.
+1 :green_heart:	@author	0m 0s	The patch does not contain any @author tags.
		_ branch-2 Compile Tests _
+1 :green_heart:	mvninstall	3m 39s	branch-2 passed
+1 :green_heart:	compile	2m 25s	branch-2 passed
+1 :green_heart:	checkstyle	0m 37s	branch-2 passed
+1 :green_heart:	spotless	0m 43s	branch has no errors when running spotless:check.
+1 :green_heart:	spotbugs	1m 31s	branch-2 passed
		_ Patch Compile Tests _
+1 :green_heart:	mvninstall	3m 20s	the patch passed
+1 :green_heart:	compile	2m 23s	the patch passed
+1 :green_heart:	javac	2m 23s	the patch passed
+1 :green_heart:	checkstyle	0m 34s	the patch passed
+1 :green_heart:	whitespace	0m 0s	The patch has no whitespace issues.
+1 :green_heart:	hadoopcheck	17m 45s	Patch does not cause any errors with Hadoop 2.10.2 or 3.2.4 3.3.4.
+1 :green_heart:	spotless	0m 42s	patch has no errors when running spotless:check.
+1 :green_heart:	spotbugs	1m 35s	the patch passed
		_ Other Tests _
+1 :green_heart:	asflicense	0m 13s	The patch does not generate ASF License warnings.
		38m 26s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	https://github.com/apache/hbase/pull/5167
Optional Tests	dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname	Linux eed655c843a8 5.4.0-144-generic #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / a67a8f7fd3
Default Java	Eclipse Adoptium-11.0.17+8
Max. process+thread count	86 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/console
versions	git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apr 09 '23 19:04 Apache-HBase

:broken_heart: -1 overall

Vote	Subsystem	Runtime	Comment
+0 :ok:	reexec	0m 49s	Docker mode activated.
-0 :warning:	yetus	0m 6s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ branch-2 Compile Tests _
+1 :green_heart:	mvninstall	4m 52s	branch-2 passed
+1 :green_heart:	compile	0m 49s	branch-2 passed
+1 :green_heart:	shadedjars	5m 26s	branch has no errors when building our shaded downstream artifacts.
+1 :green_heart:	javadoc	0m 28s	branch-2 passed
		_ Patch Compile Tests _
+1 :green_heart:	mvninstall	4m 7s	the patch passed
+1 :green_heart:	compile	0m 48s	the patch passed
+1 :green_heart:	javac	0m 48s	the patch passed
+1 :green_heart:	shadedjars	5m 10s	patch has no errors when building our shaded downstream artifacts.
+1 :green_heart:	javadoc	0m 25s	the patch passed
		_ Other Tests _
-1 :x:	unit	205m 19s	hbase-server in the patch failed.
		232m 18s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR	https://github.com/apache/hbase/pull/5167
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux d013c5df2212 5.4.0-1097-aws #105~18.04.1-Ubuntu SMP Mon Feb 13 17:50:57 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / a67a8f7fd3
Default Java	Eclipse Adoptium-11.0.17+8
unit	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/testReport/
Max. process+thread count	2750 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apr 09 '23 22:04 Apache-HBase

:broken_heart: -1 overall

Vote	Subsystem	Runtime	Comment
+0 :ok:	reexec	0m 56s	Docker mode activated.
-0 :warning:	yetus	0m 4s	Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
		_ Prechecks _
		_ branch-2 Compile Tests _
+1 :green_heart:	mvninstall	2m 59s	branch-2 passed
+1 :green_heart:	compile	0m 40s	branch-2 passed
+1 :green_heart:	shadedjars	4m 10s	branch has no errors when building our shaded downstream artifacts.
+1 :green_heart:	javadoc	0m 25s	branch-2 passed
		_ Patch Compile Tests _
+1 :green_heart:	mvninstall	2m 42s	the patch passed
+1 :green_heart:	compile	0m 41s	the patch passed
+1 :green_heart:	javac	0m 41s	the patch passed
+1 :green_heart:	shadedjars	4m 12s	patch has no errors when building our shaded downstream artifacts.
+1 :green_heart:	javadoc	0m 24s	the patch passed
		_ Other Tests _
-1 :x:	unit	218m 44s	hbase-server in the patch failed.
		240m 24s

Subsystem	Report/Notes
Docker	ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/artifact/yetus-jdk8-hadoop2-check/output/Dockerfile
GITHUB PR	https://github.com/apache/hbase/pull/5167
Optional Tests	javac javadoc unit shadedjars compile
uname	Linux ec9a8276b47e 5.4.0-137-generic #154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	branch-2 / a67a8f7fd3
Default Java	Temurin-1.8.0_352-b08
unit	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/artifact/yetus-jdk8-hadoop2-check/output/patch-unit-hbase-server.txt
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/testReport/
Max. process+thread count	2192 (vs. ulimit of 30000)
modules	C: hbase-server U: hbase-server
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5167/1/console
versions	git=2.34.1 maven=3.8.6
Powered by	Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Apr 09 '23 23:04 Apache-HBase

The test case failures are not related to this change. @taklwu could you please review the change. Thanks.

Apr 11 '23 04:04 chrajeshbabu

Why change from throwing an exception to returning false?

Apr 11 '23 09:04 Apache9

Why change from throwing an exception to returning false?

@Apache9 During the express upgrade as mentioned in comment here there is no way to to detect the meta location in the cluster because meta location znode as well as meta wal file gets deleted when the region server stopped gracefully. So region states info for meta is not getting added which leads to init meta procedure. During the meta initialization checking whether any meta table directory present in the cluster or not, if present and proper then we are throwing the IOException saying meta need to be rebuild or meta znode should be created manually. Since this procedure persisted and cannot proceed further even with multiple restarts of master until unless manual steps of meta rebuild or znode creation and procedure store deletion are performed. All these manual steps are error prone and not required because during graceful shutdowns of cluster anyway meta won't be assigned to any server so allowing meta assignment to random server during init meta procedure is correct that is what happening with my patch. Returning false because just to log that meta table directory is not deleted.

Here I have mentioned how the upgrade went smooth with my patch.

Apr 11 '23 15:04 chrajeshbabu

Thank you @chrajeshbabu. It seems the changes make sense from the upgrade viewpoint. On the other hand, if this rare scenario were to happen on a healthy 2.x cluster, does this change make any difference?

All these manual steps are error prone and not required because during graceful shutdowns of cluster anyway meta won't be assigned to any server so allowing meta assignment to random server during init meta procedure is correct that is what happening with my patch. Returning false because just to log that meta table directory is not deleted.

If this happens on 2.x cluster, what would now be different? Only that meta would be assigned on any random server, correct?

Apr 12 '23 18:04 virajjasani

Thank you @chrajeshbabu. It seems the changes make sense from the upgrade viewpoint. On the other hand, if this rare scenario were to happen on a healthy 2.x cluster, does this change make any difference?

If this happens on 2.x cluster, what would now be different? Only that meta would be assigned on any random server, >correct?

I have tried two scenarios in healthy 2.x cluster with the change

Removed zookeeper data alone and restarted cluster, things came up properly
Removed both zookeeper data as well as master data so that we hit the same scenario of meta initialisation procedure, meta region assigned to random server and one server regions came up properly and remaining servers become unknown(this is not related to this issue will check why it's happening and work on another JIRA), recovered those with hbck2 scheduleRecoveries option and cluster is normal.

By throwing exception also we need to delete master data because of failed init meta procedure not bring the master up and need to follow one of these steps

meta server znode creation which can also leads to unknown servers issue(we should use hbck2 to recover) 2)delete meta table data in hdfs and rebuild completely which is error prone and exact state of meta may not build sometimes because tables state missing in zookeeper etc...

Apr 13 '23 18:04 chrajeshbabu

Got it, thanks. This makes sense, +1 from my side.

Apr 14 '23 00:04 virajjasani

@Apache9 does this look good to you?

Apr 14 '23 00:04 virajjasani

@Apache9 will commit it if it's fine for you. Could you please confirm.

Apr 19 '23 11:04 chrajeshbabu

I still need to check the code.

This is a very critical part, if we run InitMetaProcedure when meta exists, it could cause serious data loss problem...

We should try to prevent scheduling the InitMetaProcedure, instead of just letting it go and hoping it would work...

Apr 19 '23 11:04 Apache9

After chekcing the code, I think the correct way for fixing this problem is to do the work in HMaster.tryMigrateMetaLocationsFromZooKeeper. We can some more code in this method, to check whether the meta directory is already there and if it is, if we can make sure that this is an upgrading from 1.x(by trying to read something on the filesystem? Is this possible?), then insert a record to the master region so we can skip scheduling the InitMetaProcedure.

Apr 20 '23 10:04 Apache9

to check whether the meta directory is already there and if it is, if we can make sure that this is an upgrading from 1.x(by trying to read something on the filesystem? Is this possible?)

we can check the column families in meta table to detect whether it's from 1.x or current versions.

Apr 20 '23 13:04 chrajeshbabu

And do we have read replica support on 1.x? Do we also need to insert the record for secondary replicas?

Apr 20 '23 13:04 Apache9

then insert a record to the master region so we can skip scheduling the InitMetaProcedure.

We can insert record but we may not know the state and location which leads to InitMetaProcedure again.

Apr 20 '23 13:04 chrajeshbabu

then insert a record to the master region so we can skip scheduling the InitMetaProcedure.

We can insert record but we may not know the state and location which leads to InitMetaProcedure again.

Checked the code, the condition for whether to schedule an InitMetaProcedure is

if (!this.assignmentManager.getRegionStates().hasTableRegionStates(TableName.META_TABLE_NAME)) {

So I think it only needs we have a record in master region for meta, and do not need to know its state and location?

And we will first try to migrate from zookeeper, and for 1.x, if the cluster shutdown gracefully, there is no znode for meta, then we can enter the logic described above, so in this case, I think the state for meta should be CLOSED?

Apr 20 '23 13:04 Apache9

So I think it only needs we have a record in master region for meta, and do not need to know its state and location?

And we will first try to migrate from zookeeper, and for 1.x, if the cluster shutdown gracefully, there is no znode for meta, then we can enter the logic described above, so in this case, I think the state for meta should be CLOSED?

That's correct let me check and update the patch accordingly.

Apr 20 '23 17:04 chrajeshbabu

to check whether the meta directory is already there and if it is, if we can make sure that this is an upgrading from 1.x(by trying to read something on the filesystem? Is this possible?)

we can check the column families in meta table to detect whether it's from 1.x or current versions.

@chrajeshbabu but isn't meta state unknown in your case of migration from 1.x to 2.x? Is the plan to read meta CF from filesystem directly as part of tryMigrateMetaLocationsFromZooKeeper?

Apr 20 '23 20:04 virajjasani

@Apache9 I have tried your suggestion of creating put entry of meta table in master region which gets filled in regionstates and helps to avoid calling init meta procedure there is problem with this approach. Actually meta assignment called only two places 1) during server crash procedure when the region server went down is carrying meta this is not the case during the express upgrade cases 2) during meta initialisation once after the fs layout creation step is done. By adding the just meta entry in the master region without knowing the location, won't reach any of the above flows and leading to meta region hanging in transition forever.

Here is the code I have tried.

} else { TableDescriptor metaTableDescriptor = tableDescriptors.get(TableName.META_TABLE_NAME); if(metaTableDescriptor != null && metaTableDescriptor.getColumnFamily(TABLE_FAMILY) == null) { MetaTableAccessor.addRegionInfo(put, RegionInfoBuilder.FIRST_META_REGIONINFO); put.add(CellBuilderFactory.create(CellBuilderType.SHALLOW_COPY).setRow(put.getRow()) .setFamily(HConstants.CATALOG_FAMILY) .setQualifier(RegionStateStore.getStateColumn(RegionInfoBuilder.FIRST_META_REGIONINFO.getReplicaId())).setTimestamp(put.getTimestamp()) .setType(Cell.Type.Put).setValue(Bytes.toBytes(RegionState.State.CLOSED.name())).build()); LOG.info(info.toString()); masterRegion.update(r -> r.put(put)); }

May 08 '23 11:05 chrajeshbabu

@Apache9 I have tried your suggestion of creating put entry of meta table in master region which gets filled in regionstates and helps to avoid calling init meta procedure there is problem with this approach. Actually meta assignment called only two places 1) during server crash procedure when the region server went down is carrying meta this is not the case during the express upgrade cases 2) during meta initialisation once after the fs layout creation step is done. By adding the just meta entry in the master region without knowing the location, won't reach any of the above flows and leading to meta region hanging in transition forever.

Here is the code I have tried.

} else { TableDescriptor metaTableDescriptor = tableDescriptors.get(TableName.META_TABLE_NAME); if(metaTableDescriptor != null && metaTableDescriptor.getColumnFamily(TABLE_FAMILY) == null) { MetaTableAccessor.addRegionInfo(put, RegionInfoBuilder.FIRST_META_REGIONINFO); put.add(CellBuilderFactory.create(CellBuilderType.SHALLOW_COPY).setRow(put.getRow()) .setFamily(HConstants.CATALOG_FAMILY) .setQualifier(RegionStateStore.getStateColumn(RegionInfoBuilder.FIRST_META_REGIONINFO.getReplicaId())).setTimestamp(put.getTimestamp()) .setType(Cell.Type.Put).setValue(Bytes.toBytes(RegionState.State.CLOSED.name())).build()); LOG.info(info.toString()); masterRegion.update(r -> r.put(put)); }

Better post the full patch somewhere so I can check it? And maybe we need to manually schedule a TRSP to bring meta region online in this case...

May 09 '23 15:05 Apache9

hbase hbase copied to clipboard

HBASE-27698 Migrate meta locations from zookeeper to master data may …

hbase
hbase copied to clipboard