ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-7017. Avoid logging stack trace of NotLeaderExceptions on followers

Open kmizumar opened this issue 2 years ago • 1 comments

What changes were proposed in this pull request?

change not to log stack trace of NotLeaderException

2022-01-27 12:21:38,220 [EventQueue-PipelineActionsForPipelineActionHandler] INFO org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Received pipeline action CLOSE for PipelineID=b5c18b1c-2ae1-4bd1-ac65-c0b67f157a9a from datanode f0d720c8-be6f-4251-b55b-cd72acfeaeb5. Reason : f0d720c8-be6f-4251-b55b-cd72acfeaeb5 is in candidate state for 332790ms
2022-01-27 12:21:38,256 [EventQueue-PipelineActionsForPipelineActionHandler] INFO org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler: Invoking method public abstract void org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.updatePipelineState(org.apache.hadoop.hdds.protocol.proto.HddsProtos$PipelineID,org.apache.hadoop.hdds.protocol.proto.HddsProtos$PipelineState) throws java.io.IOException on target org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl@14d176af, cost 33451.776us
2022-01-27 12:21:38,256 [EventQueue-PipelineActionsForPipelineActionHandler] ERROR org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Could not execute pipeline action=CLOSE pipeline=PipelineID=b5c18b1c-2ae1-4bd1-ac65-c0b67f157a9a
org.apache.ratis.protocol.exceptions.NotLeaderException: Server b109cac2-3bd5-4ae8-8bf0-6d8c1bb84a70@group-151C0C5DF843 is not the leader d40ba86a-c86e-417c-830f-2c9c16259e6b|rpc:scm-1.scm-1.default.svc.cluster.local:9894|admin:|client:|dataStream:|priority:0
        at org.apache.ratis.server.impl.RaftServerImpl.generateNotLeaderException(RaftServerImpl.java:667)
        at org.apache.ratis.server.impl.RaftServerImpl.checkLeaderState(RaftServerImpl.java:632)
        at org.apache.ratis.server.impl.RaftServerImpl.submitClientRequestAsync(RaftServerImpl.java:758)
        at org.apache.ratis.server.impl.RaftServerProxy.lambda$submitClientRequestAsync$9(RaftServerProxy.java:417)
        at org.apache.ratis.server.impl.RaftServerProxy.lambda$null$7(RaftServerProxy.java:412)
        at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:115)
        at org.apache.ratis.server.impl.RaftServerProxy.lambda$submitRequest$8(RaftServerProxy.java:412)
        at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1106)
        at java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235)
        at org.apache.ratis.server.impl.RaftServerProxy.submitRequest(RaftServerProxy.java:411)
        at org.apache.ratis.server.impl.RaftServerProxy.submitClientRequestAsync(RaftServerProxy.java:417)
        at org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.submitRequest(SCMRatisServerImpl.java:222)
        at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invokeRatis(SCMHAInvocationHandler.java:110)
        at org.apache.hadoop.hdds.scm.ha.SCMHAInvocationHandler.invoke(SCMHAInvocationHandler.java:67)
        at com.sun.proxy.$Proxy16.updatePipelineState(Unknown Source)
        at org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl.closePipeline(PipelineManagerImpl.java:363)
        at org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler.processPipelineAction(PipelineActionHandler.java:87)
        at org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler.lambda$onMessage$0(PipelineActionHandler.java:65)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
        at java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1085)
        at org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler.onMessage(PipelineActionHandler.java:64)
        at org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler.onMessage(PipelineActionHandler.java:43)
        at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

the log like the above would be shorten to (no stack trace):

2022-07-19 05:35:19,568 [EventQueue-NewNodeForNewNodeHandler] INFO org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl: Found 1 stale pipelines
2022-07-19 05:35:19,569 [EventQueue-NewNodeForNewNodeHandler] INFO org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl: Closing the stale pipeline: PipelineID=a26160c1-babc-4b41-8578-bdcbd77850cd
2022-07-19 05:35:19,570 [EventQueue-NewNodeForNewNodeHandler] ERROR org.apache.hadoop.hdds.scm.pipeline.PipelineManagerImpl: Closing the stale pipeline failed: Pipeline[ Id: a26160c1-babc-4b41-8578-bdcbd77850cd, Nodes: 9438d617-48cb-40a9-b2e5-62ceecd75bdf{ip: 10.244.11.3, host: dn-1.dn-1.default.svc.cluster.local, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: RATIS/ONE, State:OPEN, leaderId:9438d617-48cb-40a9-b2e5-62ceecd75bdf, CreationTimestamp2022-07-19T05:35:04.980452Z[Etc/UTC]]
2022-07-19 05:35:19,570 [EventQueue-NewNodeForNewNodeHandler] INFO org.apache.hadoop.hdds.scm.pipeline.BackgroundPipelineCreator: ignore, not leader SCM.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7017

How was this patch tested?

manually tested within my local ozone environment rugging on k8s.

kmizumar avatar Jul 19 '22 09:07 kmizumar

/pending

adoroszlai avatar Oct 17 '22 10:10 adoroszlai

Thank you very much for the patch. I am closing this PR temporarily as there was no activity recently and it is waiting for response from its author.

It doesn't mean that this PR is not important or ignored: feel free to reopen the PR at any time.

It only means that attention of committers is not required. We prefer to keep the review queue clean. This ensures PRs in need of review are more visible, which results in faster feedback for all PRs.

If you need ANY help to finish this PR, please contact the community on the mailing list or the slack channel."

github-actions[bot] avatar Nov 08 '22 00:11 github-actions[bot]