hadoop icon indicating copy to clipboard operation
hadoop copied to clipboard

HDFS-16764. ObserverNamenode handles addBlock rpc and throws a FileNotFoundException

Open ZanderXu opened this issue 2 years ago • 17 comments

Description of PR

ObserverNameNode currently can handle the addBlockLocation RPC, but it may throw a FileNotFoundException when it contains stale txid.

  • AddBlock is not a coordinated method, so Observer will not check the statId.
  • AddBlock does the validation with checkOperation(OperationCategory.READ)

So the observer namenode can handle the addBlock rpc, and it will throw a FileNotFoundException during doing validation when this observer cannot replay the edit of create file.

The related code as follows:

checkOperation(OperationCategory.READ);
final FSPermissionChecker pc = getPermissionChecker();
FSPermissionChecker.setOperationType(operationName);
readLock();
try {
  checkOperation(OperationCategory.READ);
  r = FSDirWriteFileOp.validateAddBlock(this, pc, src, fileId, clientName, previous, onRetryBlock);
} finally {
  readUnlock(operationName);
} 

ZanderXu avatar Sep 08 '22 03:09 ZanderXu

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 48s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 39m 8s trunk passed
+1 :green_heart: compile 1m 36s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 1m 29s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 17s trunk passed
+1 :green_heart: mvnsite 1m 46s trunk passed
+1 :green_heart: javadoc 1m 20s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 43s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 33s trunk passed
+1 :green_heart: shadedclient 22m 40s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 21s the patch passed
+1 :green_heart: compile 1m 23s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 1m 23s the patch passed
+1 :green_heart: compile 1m 16s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 16s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 0m 58s the patch passed
+1 :green_heart: mvnsite 1m 27s the patch passed
+1 :green_heart: javadoc 0m 57s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 21s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 14s the patch passed
+1 :green_heart: shadedclient 22m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 240m 24s hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 1m 3s The patch does not generate ASF License warnings.
349m 29s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/1/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4872
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux ff7330e69581 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 50da336a3cb231f01b7df6dc19da5740d5310096
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/1/testReport/
Max. process+thread count 3070 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Sep 08 '22 09:09 hadoop-yetus

@ayushtkn Sir, thanks for your review.

You can not change the type of lock just for the sake of observers

I'm not changed the type of lock, just change the type of checkOperation. This change will not lead to serious performance issues, because the performance of checkOperation(OperationCategory.WRITE) is similar to checkOperation(OperationCategory.READ);

ZanderXu avatar Sep 11 '22 01:09 ZanderXu

Apart from observers I think it can impact even when fs.ha.allow.stale.reads is set to True

Yeah, it always impact even when fs.ha.allow.stale.reads is True.

HDFS-4591 added this modification. @atm Sir, do you have time to take a look about this PR?

@jojochuang @goiri @Hexiaoqiao @ferhui Masters, can help me review this PR too?

ZanderXu avatar Sep 13 '22 02:09 ZanderXu

@xkrogen @ashutoshcipher Sir, can you help me review this patch?

ZanderXu avatar Sep 19 '22 08:09 ZanderXu

Thanks @ZanderXu for involving me in this PR. I will try doing it in my free slots.

ashutoshcipher avatar Sep 19 '22 08:09 ashutoshcipher

@zhengchenyu Sir, I saw you have some PRs about OBserverRead. Can you help me review this PR? We have encountered this problem many times in our prod environment.

ZanderXu avatar Sep 21 '22 03:09 ZanderXu

@steveloughran Sir, can you help me review this pr when you are available?

ZanderXu avatar Oct 05 '22 14:10 ZanderXu

Sorry, I missed reviewing it. Changes makes sense to me. I am +1

ashutoshcipher avatar Oct 05 '22 15:10 ashutoshcipher

@ZanderXu sorry, i don't go near HDFS

steveloughran avatar Oct 06 '22 10:10 steveloughran

Hi @ZanderXu,

I am not clear what exactly you are trying to achieve here. Are you trying to enable observerNodes to handle addBlock RPC from clients? I believe addBlock is an update operation to FSNamespace and only the activeNN should handle modification to FSNamespace.

xinglin avatar Oct 08 '22 23:10 xinglin

Hi @ZanderXu,

I am not clear what exactly you are trying to achieve here. Are you trying to enable observerNodes to handle addBlock RPC from clients? I believe addBlock is an update operation to FSNamespace and only the activeNN should handle modification to FSNamespace.

@xinglin Thanks for your review. Maybe you missed somethings. addBlock is an update operation. We expected that only activeNN can handle it, but in the current logic, ObserverNN or StandbyNN can handle it and may return one FileNotFoundException to Client. Client will not failover this operation to ActiveNN with a FileNotFoundException response.

ObserverNN or StandbyNN should throw one StandbyException or RetryOnActiveException to Client when handling update operations and let Client to failover this operation to Active.

ZanderXu avatar Oct 09 '22 02:10 ZanderXu

Hi @ZanderXu,

I understand what you tried to do now. The first checkOperation(OperationCategory.WRITE) call will throw a RetryOnActiveException on an observerNN.

Are you aware why there are multiple checkOperation() in this function? I'd assume a single checkOperation(OperationCategory.WRITE) at the beginning of this function should be sufficient? Why are there checks for both READ and WRITE and why do we check for READ/writes twice (one before getting the lock and the one after getting the lock)?

xinglin avatar Oct 09 '22 05:10 xinglin

Are you aware why there are multiple checkOperation() in this function?

checkOperation() at the beginning is used to quickly check. checkOperation() in lock is used to avoid some conflict cause with HA failover.

ZanderXu avatar Oct 09 '22 06:10 ZanderXu

lgtm

@xinglin Thanks for your review. BTW I have many opening PRs, can you help me review them when you are available?

ZanderXu avatar Oct 09 '22 07:10 ZanderXu

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 50s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 41m 44s trunk passed
+1 :green_heart: compile 1m 44s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 1m 32s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 19s trunk passed
+1 :green_heart: mvnsite 1m 46s trunk passed
+1 :green_heart: javadoc 1m 19s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 46s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 4m 1s trunk passed
+1 :green_heart: shadedclient 27m 5s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 22s the patch passed
+1 :green_heart: compile 1m 26s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 1m 26s the patch passed
+1 :green_heart: compile 1m 20s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 20s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
-0 :warning: checkstyle 1m 0s /results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 143 unchanged - 0 fixed = 144 total (was 143)
+1 :green_heart: mvnsite 1m 27s the patch passed
+1 :green_heart: javadoc 0m 57s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 32s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 34s the patch passed
+1 :green_heart: shadedclient 25m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 366m 45s hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 1m 9s The patch does not generate ASF License warnings.
486m 20s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/2/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4872
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux ee571155c82c 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / df6c171b7707f0f4ee2f10e21a3158b852409699
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/2/testReport/
Max. process+thread count 2028 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Oct 09 '22 14:10 hadoop-yetus

nit: maybe change the PR title to "Fix ObserverNN to throw ObserverRetryOnActiveException when receiving addBlock RPC, instead of FileNotFoundException"

And in the description, add "addBlock() is a WRITE operation on FSNamespace. We changed a few checkOperation(READ) to checkOperation(WRITE) in addBlock() so that the check will fail with a correct exception when running by ObserverNNs".

Sure, could you ping me for the PRs you want me to take a look? I am not a hadoop committer but can take a look when I have bandwidth.

xinglin avatar Oct 09 '22 20:10 xinglin

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 1m 2s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 85m 37s trunk passed
+1 :green_heart: compile 1m 37s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 1m 27s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 14s trunk passed
+1 :green_heart: mvnsite 1m 41s trunk passed
+1 :green_heart: javadoc 1m 33s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 36s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 47s trunk passed
+1 :green_heart: shadedclient 26m 2s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 21s the patch passed
+1 :green_heart: compile 1m 27s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 1m 27s the patch passed
+1 :green_heart: compile 1m 20s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 20s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 1m 0s the patch passed
+1 :green_heart: mvnsite 1m 27s the patch passed
+1 :green_heart: javadoc 0m 57s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 26s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 36s the patch passed
+1 :green_heart: shadedclient 25m 53s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 335m 10s hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 0m 55s The patch does not generate ASF License warnings.
497m 20s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/3/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4872
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 9a7152c37b99 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 2e5e8f534f01f72f67f80581c8b89e5c68cf9854
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/3/testReport/
Max. process+thread count 2068 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Oct 10 '22 11:10 hadoop-yetus

@ayushtkn Sir, can help me finally review it?

@tomscut @Hexiaoqiao Can help me double-review it when you are available?

ZanderXu avatar Oct 28 '22 08:10 ZanderXu

@Hexiaoqiao Sir, can you help me review this PR?

ZanderXu avatar Nov 03 '22 04:11 ZanderXu

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 1m 1s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 42m 17s trunk passed
+1 :green_heart: compile 1m 37s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 1m 27s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 16s trunk passed
+1 :green_heart: mvnsite 1m 39s trunk passed
+1 :green_heart: javadoc 1m 15s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 43s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 43s trunk passed
+1 :green_heart: shadedclient 26m 37s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 28s the patch passed
+1 :green_heart: compile 1m 35s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 1m 35s the patch passed
+1 :green_heart: compile 1m 23s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 23s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 1m 2s the patch passed
+1 :green_heart: mvnsite 1m 31s the patch passed
+1 :green_heart: javadoc 1m 0s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 34s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 49s the patch passed
+1 :green_heart: shadedclient 26m 40s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 351m 17s hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 0m 56s The patch does not generate ASF License warnings.
471m 51s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/4/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4872
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 033df28c4be5 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a44c5a2c2d01f68e91febdf82c9f2d4e25d53896
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/4/testReport/
Max. process+thread count 1847 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Nov 07 '22 14:11 hadoop-yetus

@tomscut Sir, I have updated this PR based on your suggestion, can help me review it again? Thanks.

ZanderXu avatar Nov 08 '22 10:11 ZanderXu

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 1m 18s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
-1 :x: mvninstall 4m 4s /branch-mvninstall-root.txt root in trunk failed.
+1 :green_heart: compile 3m 56s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 1m 21s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 6s trunk passed
+1 :green_heart: mvnsite 1m 35s trunk passed
+1 :green_heart: javadoc 1m 10s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 33s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 44s trunk passed
+1 :green_heart: shadedclient 29m 9s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 23s the patch passed
+1 :green_heart: compile 1m 30s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 1m 30s the patch passed
+1 :green_heart: compile 1m 24s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 24s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 0m 59s the patch passed
+1 :green_heart: mvnsite 1m 29s the patch passed
+1 :green_heart: javadoc 0m 55s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 32s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 31s the patch passed
+1 :green_heart: shadedclient 23m 5s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 :x: unit 248m 22s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 1m 7s The patch does not generate ASF License warnings.
331m 56s
Reason Tests
Failed junit tests hadoop.hdfs.server.blockmanagement.TestBlockTokenWithShortCircuitRead
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/6/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4872
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 1d27fcfec7d8 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e7fe31473cdb0a212d735f3164cdcfb13c0d9449
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/6/testReport/
Max. process+thread count 2991 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Nov 08 '22 15:11 hadoop-yetus

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 44s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 40m 39s trunk passed
+1 :green_heart: compile 1m 42s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 1m 29s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 19s trunk passed
+1 :green_heart: mvnsite 1m 47s trunk passed
+1 :green_heart: javadoc 1m 22s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 42s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 55s trunk passed
+1 :green_heart: shadedclient 23m 40s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 25s the patch passed
+1 :green_heart: compile 1m 25s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 1m 25s the patch passed
+1 :green_heart: compile 1m 15s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 15s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 1m 1s the patch passed
+1 :green_heart: mvnsite 1m 34s the patch passed
+1 :green_heart: javadoc 1m 3s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 34s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 53s the patch passed
+1 :green_heart: shadedclient 27m 6s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 250m 25s hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 1m 4s The patch does not generate ASF License warnings.
367m 21s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/7/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4872
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux abc108e07b48 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e7fe31473cdb0a212d735f3164cdcfb13c0d9449
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/7/testReport/
Max. process+thread count 2889 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/7/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Nov 08 '22 16:11 hadoop-yetus

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 56s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
-1 :x: mvninstall 7m 58s /branch-mvninstall-root.txt root in trunk failed.
+1 :green_heart: compile 2m 21s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 1m 25s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 8s trunk passed
+1 :green_heart: mvnsite 1m 31s trunk passed
+1 :green_heart: javadoc 1m 9s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 34s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 39s trunk passed
+1 :green_heart: shadedclient 29m 14s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 23s the patch passed
+1 :green_heart: compile 1m 26s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 1m 26s the patch passed
+1 :green_heart: compile 1m 19s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 19s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 0m 59s the patch passed
+1 :green_heart: mvnsite 1m 27s the patch passed
+1 :green_heart: javadoc 0m 59s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 33s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 30s the patch passed
+1 :green_heart: shadedclient 26m 22s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 :x: unit 385m 31s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 1m 2s The patch does not generate ASF License warnings.
473m 35s
Reason Tests
Failed junit tests hadoop.hdfs.server.namenode.ha.TestObserverNode
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/5/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4872
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 5bad9a612da0 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c1fef12762e968c760fd8da6ed4e895a5329934f
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/5/testReport/
Max. process+thread count 1896 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Nov 08 '22 17:11 hadoop-yetus

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 1m 3s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 42m 47s trunk passed
+1 :green_heart: compile 1m 42s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 1m 28s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: checkstyle 1m 16s trunk passed
+1 :green_heart: mvnsite 1m 37s trunk passed
+1 :green_heart: javadoc 1m 16s trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 41s trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 44s trunk passed
+1 :green_heart: shadedclient 26m 13s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 1m 23s the patch passed
+1 :green_heart: compile 1m 27s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 1m 27s the patch passed
+1 :green_heart: compile 1m 17s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: javac 1m 17s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 1m 0s the patch passed
+1 :green_heart: mvnsite 1m 27s the patch passed
+1 :green_heart: javadoc 0m 57s the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 28s the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
+1 :green_heart: spotbugs 3m 34s the patch passed
+1 :green_heart: shadedclient 25m 55s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 :x: unit 362m 44s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 :green_heart: asflicense 0m 55s The patch does not generate ASF License warnings.
482m 29s
Reason Tests
Failed junit tests hadoop.hdfs.server.namenode.ha.TestObserverNode
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/8/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/4872
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux a4b8f6035214 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 9296e1795b66b8602b5046f1c951ddc7829f930d
Default Java Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/8/testReport/
Max. process+thread count 1875 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/8/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Nov 10 '22 16:11 hadoop-yetus

The failed UT hadoop.hdfs.server.namenode.ha.TestObserverNode doesn't relate to this PR.

ZanderXu avatar Nov 12 '22 03:11 ZanderXu

Sorry for being late to the party here. The changes to getAdditionalBlock look fine to me -- we already check for WRITE operation later in the same method, this just moves the check a little earlier. No issues there.

The changes to getAdditionalDatanode, which legitimately seems not to do any write modification to the namespace (e.g. no write lock is held), are less obviously okay. I think the real problem here comes from the mismatch: FSNamesystem#getAdditionalDatanode considers itself a read-op, but in ClientProtocol, it is not annotated as @ReadOnly, so it's considered like a write op. So we can either change it to OperationCategory.WRITE, as proposed in this PR, or we can mark it as @ReadOnly(isCoordinated = true). Either one would solve the current problem. Marking it read-only is better from a perf/scalability perspective if it really can be safely served by an ObserverNode. Looking through getAdditionalDatanode, the only part I would be worried about is that we generate a new block token as part of the response. AFAICT from the handling of the keys used by BlockTokenSecretManager, the DataNodes will fetch/trust keys produced by all NNs, not just the active, so I think this is all good.

So, unless anyone sees a reason why we can't mark getAdditionalDatanode as @ReadOnly, I would propose to do that instead of switching its OperationCategory to WRITE.

xkrogen avatar Nov 18 '22 21:11 xkrogen

Sorry for being late to the party here. The changes to getAdditionalBlock look fine to me -- we already check for WRITE operation later in the same method, this just moves the check a little earlier. No issues there.

The changes to getAdditionalDatanode, which legitimately seems not to do any write modification to the namespace (e.g. no write lock is held), are less obviously okay. I think the real problem here comes from the mismatch: FSNamesystem#getAdditionalDatanode considers itself a read-op, but in ClientProtocol, it is not annotated as @ReadOnly, so it's considered like a write op. So we can either change it to OperationCategory.WRITE, as proposed in this PR, or we can mark it as @ReadOnly(isCoordinated = true). Either one would solve the current problem. Marking it read-only is better from a perf/scalability perspective if it really can be safely served by an ObserverNode. Looking through getAdditionalDatanode, the only part I would be worried about is that we generate a new block token as part of the response. AFAICT from the handling of the keys used by BlockTokenSecretManager, the DataNodes will fetch/trust keys produced by all NNs, not just the active, so I think this is all good.

So, unless anyone sees a reason why we can't mark getAdditionalDatanode as @ReadOnly, I would propose to do that instead of switching its OperationCategory to WRITE.

@xkrogen Sir, thanks for your carefully review and nice suggestion. NameNode will choose one new datanode with considering datanode state and reserved capacity during handling the getAdditionalDatanode, such as stale datanode, busy datanode, slow datanode, maintenance datanode, etc..
So I think getAdditionalDatanode should be handled by ActiveNameNode.

ZanderXu avatar Nov 22 '22 13:11 ZanderXu

But the ObserverNN should be aware of the states of DataNodes as well, right? Staleness, maintenance state, etc. It might be slightly stale, but I don't immediately see why that would cause an issue. Even some of the information the ActiveNN has could be a little stale.

xkrogen avatar Nov 22 '22 19:11 xkrogen

@xkrogen Sir, thanks for your explanation.

Yeah, ObserverNN also knows the states of DataNodes as well. But the scheduledSize is not shared, and almost all datanode chosen are handled by Active, so Active has a complete judgment basis, so it's better been handled by Active.

Of course, it would be better if some logic of datanode chosen can be moved to Standby or Observer, but do we want to do this change directly in this PR? Or open a new PR to discuss it?

ZanderXu avatar Nov 23 '22 00:11 ZanderXu