hbase icon indicating copy to clipboard operation
hbase copied to clipboard

HBASE-29734 Fix miscalculated maxWait timeout in MoveWithAck

Open jcongithub opened this issue 1 month ago • 10 comments

This PR fixes the miscalculated maxWait in MoveWithAck using current time instead of sartTime

In MoveWithAck, the maxWait timeout was calculated inside the retry loop using a fixed startTime calculated at the out side of the retry loop, causing it to remain constant across all retry attempts. This means that each retry didn't really check isSameServer and marks all retries failed.

jcongithub avatar Nov 27 '25 21:11 jcongithub

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 39s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 :green_heart: mvninstall 4m 32s master passed
+1 :green_heart: compile 4m 0s master passed
+1 :green_heart: checkstyle 1m 11s master passed
+1 :green_heart: spotbugs 2m 40s master passed
+1 :green_heart: spotless 1m 31s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 5m 25s the patch passed
+1 :green_heart: compile 5m 24s the patch passed
+1 :green_heart: javac 5m 24s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 1m 33s the patch passed
+1 :green_heart: spotbugs 2m 50s the patch passed
+1 :green_heart: hadoopcheck 15m 56s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 :green_heart: spotless 1m 0s patch has no errors when running spotless:check.
_ Other Tests _
+1 :green_heart: asflicense 0m 12s The patch does not generate ASF License warnings.
57m 31s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/7488
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 916a4ef3fcd9 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / cacc56108d7e789a2309b355d753606e392d0608
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 27 '25 22:11 Apache-HBase

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 36s Docker mode activated.
-0 :warning: yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 :green_heart: mvninstall 5m 8s master passed
+1 :green_heart: compile 1m 27s master passed
+1 :green_heart: javadoc 0m 40s master passed
+1 :green_heart: shadedjars 7m 23s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 3m 38s the patch passed
+1 :green_heart: compile 1m 9s the patch passed
+1 :green_heart: javac 1m 9s the patch passed
+1 :green_heart: javadoc 0m 35s the patch passed
+1 :green_heart: shadedjars 7m 16s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 :green_heart: unit 248m 38s hbase-server in the patch passed.
282m 8s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/7488
Optional Tests javac javadoc unit compile shadedjars
uname Linux 5b15f35fb97b 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / cacc56108d7e789a2309b355d753606e392d0608
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/1/testReport/
Max. process+thread count 3506 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Nov 28 '25 02:11 Apache-HBase

It is bit confusing. There are two situations. Whether we want to wait for max move time fro entire region mover or per move. Last code was considering the entire move operations and your code change it to per move. Now i am not sure what is the real intention out of two. Have to check the code once.

mnpoonia avatar Nov 30 '25 08:11 mnpoonia

@mnpoonia Thanks for the review. In the current code, the wait loop is inside the retry loop, so maxWait should be per-retry. If maxWait were a single absolute deadline for all retries, once the first retry waits up to that deadline, subsequent retries would skip the wait loop because the current time already exceeds the deadline. As a result, isSameServer(...) would not be invoked again, sameServer would never update after the first retry, and the move would fail regardless of later attempts.

jcongithub avatar Dec 04 '25 15:12 jcongithub

I think the logic after the PR is more reasonable, although we lose the ability to limit the whole retry time with a single configuration, but you can still control it with maxRetries * maxWait.

So I'm +1 on the change.

Let me request review for some other committer to see if there are other opinion.

Thanks.

Apache9 avatar Dec 05 '25 00:12 Apache9

I am +1 , it looks like previous retry logic didn't work

you can still control it with maxRetries * maxWait

👍

droudnitsky avatar Dec 05 '25 01:12 droudnitsky

Thanks all for reviewing the PR. I have raised a related PR for a change in RegionMover to fix timeoutInSeconds calculation to consider retry attempts in MoveWithAck

jcongithub avatar Dec 05 '25 04:12 jcongithub

+1 LGTM

I agree that the current code does not make sense.

This does change the semantics of the timeout. Maybe merge this with #7519 ? I think that a release note should also be added.

stoty avatar Dec 08 '25 07:12 stoty

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 0s Docker mode activated.
-1 :x: docker 0m 4s Docker failed to build run-specific yetus/hbase:tp-6897}.
Subsystem Report/Notes
GITHUB PR https://github.com/apache/hbase/pull/7488
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/2/console
versions git=2.17.1
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Dec 11 '25 22:12 Apache-HBase

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 44s Docker mode activated.
-0 :warning: yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 :green_heart: mvninstall 5m 18s master passed
+1 :green_heart: compile 1m 16s master passed
+1 :green_heart: javadoc 0m 40s master passed
+1 :green_heart: shadedjars 6m 58s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 4m 12s the patch passed
+1 :green_heart: compile 1m 20s the patch passed
+1 :green_heart: javac 1m 20s the patch passed
+1 :green_heart: javadoc 0m 42s the patch passed
+1 :green_heart: shadedjars 6m 48s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 :x: unit 17m 14s /patch-unit-hbase-server.txt hbase-server in the patch failed.
47m 31s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR https://github.com/apache/hbase/pull/7488
Optional Tests javac javadoc unit compile shadedjars
uname Linux d0e59812cdbe 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 6531542457748f5055cdbd6eb8726410f058456d
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/2/testReport/
Max. process+thread count 1111 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase avatar Dec 11 '25 22:12 Apache-HBase