hbase
hbase copied to clipboard
HBASE-27763 Recover WAL encounter KeeperErrorCode = NoNode cause Regi…
…onServer crash
:broken_heart: -1 overall
| Vote | Subsystem | Runtime | Comment |
|---|---|---|---|
| +0 :ok: | reexec | 0m 24s | Docker mode activated. |
| _ Prechecks _ | |||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. |
| +1 :green_heart: | hbaseanti | 0m 0s | Patch does not have any anti-patterns. |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. |
| _ master Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 3m 54s | master passed |
| +1 :green_heart: | compile | 2m 34s | master passed |
| +1 :green_heart: | checkstyle | 0m 36s | master passed |
| +1 :green_heart: | spotless | 0m 43s | branch has no errors when running spotless:check. |
| +1 :green_heart: | spotbugs | 1m 30s | master passed |
| _ Patch Compile Tests _ | |||
| +1 :green_heart: | mvninstall | 3m 35s | the patch passed |
| +1 :green_heart: | compile | 2m 31s | the patch passed |
| +1 :green_heart: | javac | 2m 31s | the patch passed |
| -0 :warning: | checkstyle | 0m 34s | hbase-server: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) |
| +1 :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. |
| +1 :green_heart: | hadoopcheck | 13m 22s | Patch does not cause any errors with Hadoop 3.2.4 3.3.4. |
| -1 :x: | spotless | 0m 36s | patch has 53 errors when running spotless:check, run spotless:apply to fix. |
| -1 :x: | spotbugs | 1m 41s | hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) |
| _ Other Tests _ | |||
| +1 :green_heart: | asflicense | 0m 10s | The patch does not generate ASF License warnings. |
| 40m 13s |
| Reason | Tests |
|---|---|
| FindBugs | module:hbase-server |
| Sequence of calls to java.util.concurrent.ConcurrentHashMap may not be atomic in org.apache.hadoop.hbase.replication.regionserver.RecoveredReplicationSource.startShipperWorks() At RecoveredReplicationSource.java:may not be atomic in org.apache.hadoop.hbase.replication.regionserver.RecoveredReplicationSource.startShipperWorks() At RecoveredReplicationSource.java:[line 180] |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/Dockerfile |
| GITHUB PR | https://github.com/apache/hbase/pull/5177 |
| Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile |
| uname | Linux d9c601b0220d 5.4.0-1093-aws #102~18.04.2-Ubuntu SMP Wed Dec 7 00:31:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/hbase-personality.sh |
| git revision | master / a71105997f |
| Default Java | Eclipse Adoptium-11.0.17+8 |
| checkstyle | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt |
| spotless | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/patch-spotless.txt |
| spotbugs | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/artifact/yetus-general-check/output/new-spotbugs-hbase-server.html |
| Max. process+thread count | 82 (vs. ulimit of 30000) |
| modules | C: hbase-server U: hbase-server |
| Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5177/1/console |
| versions | git=2.34.1 maven=3.8.6 spotbugs=4.7.3 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
This message was automatically generated.
Mind explaining more on how do we fix the no node exception?
Hi @gottagogottagoGxj, appreciate if you could give some more explain about this ticket and your HBase version.
Seems I met this issue too, on HBase 2.4.11
Here is my log:
2024-03-21 16:19:43,379 WARN [ReplicationExecutor-0.replicationSource,xxxxx,1705567104078.replicationSource.shipper000.000.000.000%2C16020%2C1705567104078.000.000.000.000%2C16020%2C1705567104078.regiongroup-1,xxxxx,1705567104078] regionserver.ReplicationSourceShipper: com.shopee.di.foundation.hbase.KafkaInterClusterReplicationEndpoint threw unknown exception:
java.util.ConcurrentModificationException
at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1221)
at org.apache.hadoop.hbase.replication.regionserver.MetricsSource.updateTableLevelMetrics(MetricsSource.java:112)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.shipEdits(ReplicationSourceShipper.java:215)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.run(ReplicationSourceShipper.java:117)
2024-03-21 16:19:43,405 ERROR [ReplicationExecutor-0.replicationSource,xxxxx,1705567104078.replicationSource.shipper000.000.000.000%2C16020%2C1705567104078.000.000.000.000%2C16020%2C1705567104078.regiongroup-1,xxxxx,1705567104078] regionserver.HRegionServer: ***** ABORTING region server ip-10-80-163-145.idata-server.shopee.io,16020,1704705566934: Failed to operate on replication queue *****
org.apache.hadoop.hbase.replication.ReplicationException: Failed to set log position (serverName=xxxxx,1704705566934, queueId=xxxxx,1705567104078, fileName=000.000.000.000%2C16020%2C1705567104078.000.000.000.000%2C16020%2C1705567104078.regiongroup-1.1711008927746, position=130724689)
at org.apache.hadoop.hbase.replication.ZKReplicationQueueStorage.setWALPosition(ZKReplicationQueueStorage.java:255)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.lambda$logPositionAndCleanOldLogs$8(ReplicationSourceManager.java:552)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.interruptOrAbortWhenFail(ReplicationSourceManager.java:500)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:551)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceInterface.logPositionAndCleanOldLogs(ReplicationSourceInterface.java:206)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.updateLogPosition(ReplicationSourceShipper.java:264)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.shipEdits(ReplicationSourceShipper.java:203)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceShipper.run(ReplicationSourceShipper.java:117)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1925)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1830)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:658)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1534)
at org.apache.hadoop.hbase.replication.ZKReplicationQueueStorage.setWALPosition(ZKReplicationQueueStorage.java:245)
... 7 more
*Desensitized information such as servername and IP.
Thank you.