hadoop HDFS-15869. Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

I propose a fix for HDFS-15869.

Mar 03 '21 05:03 functioner

Thanks @Hexiaoqiao for your review. I have added the multi-threaded executor for it.

For the exception handling part, I observe the original semantics of RpcEdit https://github.com/apache/hadoop/blob/aaedc51d8783c2808563d5c8b51b68ab79e19820/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java#L379-L387 and SyncEdit https://github.com/apache/hadoop/blob/aaedc51d8783c2808563d5c8b51b68ab79e19820/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java#L347-L353 Therefore, we don't need to add any extra handling mechanism.

I have checked the test of TestEditLogRace and TestEditLog in my local machine. It passes all the tests. Let's see if there are other tests that fail.

Mar 24 '21 22:03 functioner

@functioner would you mind to add unit test to cover this improvement? cc @jojochuang @ayushtkn @linyiqun I believe this is a great improvement especially for heavy response. Would you mind to take another review. Thanks.

Mar 25 '21 03:03 Hexiaoqiao

@functioner would you mind to add unit test to cover this improvement?

I'm writing a unit test to cover this improvement. After I read the test cases of TestEditLog and TestEditLogRace, I think we can try to add a test similar to TestEditLogRace#testDeadlock. My current design is basically adding an Edit which intentionally sleeps for a while in its logSyncNotify method, blocking the next Edit, and test if the next Edit can finish soon.

However, if we use this design, the test needs to access some private methods and classes in FSEditLogAsync. I think maybe it's not a good idea to change some modifiers in FSEditLogAsync. Can we use something like @exposeToTest? (sorry that I don't have experience in this part). Or can you come up with a better design? @Hexiaoqiao @jojochuang @ayushtkn @linyiqun Thanks.

Mar 25 '21 04:03 functioner

:broken_heart: -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 :ok:	reexec	0m 55s		Docker mode activated.
			_ Prechecks _
+1 :green_heart:	dupname	0m 0s		No case conflicting files found.
+0 :ok:	codespell	0m 1s		codespell was not available.
+1 :green_heart:	@author	0m 0s		The patch does not contain any @author tags.
-1 :x:	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 :green_heart:	mvninstall	32m 33s		trunk passed
+1 :green_heart:	compile	1m 20s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	compile	1m 13s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	checkstyle	1m 1s		trunk passed
+1 :green_heart:	mvnsite	1m 22s		trunk passed
+1 :green_heart:	javadoc	0m 55s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javadoc	1m 25s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	spotbugs	3m 8s		trunk passed
+1 :green_heart:	shadedclient	16m 14s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 :green_heart:	mvninstall	1m 10s		the patch passed
+1 :green_heart:	compile	1m 9s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javac	1m 9s		the patch passed
+1 :green_heart:	compile	1m 6s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	javac	1m 6s		the patch passed
+1 :green_heart:	blanks	0m 0s		The patch has no blanks issues.
-0 :warning:	checkstyle	0m 52s	/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 :green_heart:	mvnsite	1m 11s		the patch passed
+1 :green_heart:	javadoc	0m 45s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javadoc	1m 19s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	spotbugs	3m 7s		the patch passed
+1 :green_heart:	shadedclient	16m 3s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 :x:	unit	730m 2s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 :green_heart:	asflicense	0m 45s		The patch does not generate ASF License warnings.
		815m 32s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.namenode.TestDecommissioningStatus
	hadoop.hdfs.TestDecommission
	hadoop.hdfs.TestDistributedFileSystemWithECFile
	hadoop.hdfs.TestRollingUpgradeDowngrade
	hadoop.hdfs.TestPersistBlocks
	hadoop.hdfs.TestDFSShell
	hadoop.hdfs.server.namenode.ha.TestStandbyIsHot
	hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
	hadoop.hdfs.server.mover.TestMover
	hadoop.hdfs.server.namenode.TestFsck
	hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion
	hadoop.hdfs.server.namenode.TestCacheDirectivesWithViewDFS
	hadoop.hdfs.TestFetchImage
	hadoop.hdfs.TestDistributedFileSystemWithECFileWithRandomECPolicy
	hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages
	hadoop.hdfs.server.namenode.ha.TestStandbyBlockManagement
	hadoop.hdfs.server.namenode.TestListOpenFiles
	hadoop.hdfs.server.namenode.TestCacheDirectives
	hadoop.hdfs.TestDecommissionWithBackoffMonitor
	hadoop.hdfs.tools.TestDFSAdminWithHA
	hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
	hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized
	hadoop.hdfs.server.namenode.TestNamenodeRetryCache
	hadoop.hdfs.server.balancer.TestBalancerService
	hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
	hadoop.hdfs.server.datanode.TestBlockScanner
	hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand
	hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
	hadoop.hdfs.server.namenode.TestStoragePolicySatisfierWithHA
	hadoop.hdfs.TestHDFSFileSystemContract
	hadoop.hdfs.web.TestWebHdfsFileSystemContract
	hadoop.fs.viewfs.TestNNStartupWhenViewFSOverloadSchemeEnabled

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/2/artifact/out/Dockerfile
GITHUB PR	https://github.com/apache/hadoop/pull/2737
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname	Linux 36ed91d4be9c 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / aaedc51d8783c2808563d5c8b51b68ab79e19820
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/2/testReport/
Max. process+thread count	2867 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Mar 25 '21 11:03 hadoop-yetus

@Hexiaoqiao Thanks for the comment, I removed logSyncNotifyExecutor.shutdown().

I think we also need to add some comments in the code to show why we don't shutdown this executor, in case some developers may get confused, because at the first glance it's also normal for people to think that this executor should shutdown on close.

Furthermore, I think your argument implies that the FSEditLogAsync is singleton in terms of the namenode process, otherwise we may create multiple executors without shutting down any of them. Is FSEditLogAsync really always singleton (now & future)?

Apr 14 '21 15:04 functioner

I think we also need to add some comments in the code to show why we don't shutdown this executor, in case some developers may get confused, because at the first glance it's also normal for people to think that this executor should shutdown on close.

+1.

Furthermore, I think your argument implies that the FSEditLogAsync is singleton in terms of the namenode process, otherwise we may create multiple executors without shutting down any of them. Is FSEditLogAsync really always singleton (now & future)?

IMO, it is true that FSEditLogAsync is singleton instance for NameNode process.

Apr 14 '21 15:04 Hexiaoqiao

:broken_heart: -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 :ok:	reexec	0m 36s		Docker mode activated.
			_ Prechecks _
+1 :green_heart:	dupname	0m 0s		No case conflicting files found.
+0 :ok:	codespell	0m 1s		codespell was not available.
+1 :green_heart:	@author	0m 0s		The patch does not contain any @author tags.
-1 :x:	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 :green_heart:	mvninstall	34m 1s		trunk passed
+1 :green_heart:	compile	1m 21s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	compile	1m 16s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	checkstyle	1m 2s		trunk passed
+1 :green_heart:	mvnsite	1m 19s		trunk passed
+1 :green_heart:	javadoc	0m 54s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javadoc	1m 24s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	spotbugs	3m 4s		trunk passed
+1 :green_heart:	shadedclient	16m 4s		branch has no errors when building and testing our client artifacts.
-0 :warning:	patch	16m 23s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 :green_heart:	mvninstall	1m 12s		the patch passed
+1 :green_heart:	compile	1m 13s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javac	1m 13s		the patch passed
+1 :green_heart:	compile	1m 9s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	javac	1m 9s		the patch passed
+1 :green_heart:	blanks	0m 0s		The patch has no blanks issues.
-0 :warning:	checkstyle	0m 52s	/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2)
+1 :green_heart:	mvnsite	1m 11s		the patch passed
+1 :green_heart:	javadoc	0m 45s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javadoc	1m 16s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	spotbugs	3m 6s		the patch passed
+1 :green_heart:	shadedclient	15m 52s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 :x:	unit	231m 29s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 :green_heart:	asflicense	0m 45s		The patch does not generate ASF License warnings.
		317m 56s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.balancer.TestBalancer
	hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
	hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
	hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
	hadoop.hdfs.server.datanode.TestDirectoryScanner

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/3/artifact/out/Dockerfile
GITHUB PR	https://github.com/apache/hadoop/pull/2737
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname	Linux f6092b2b51d2 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / bbc76cc242d2ba7c79924d77adab96a30d7eb013
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/3/testReport/
Max. process+thread count	3144 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Apr 14 '21 20:04 hadoop-yetus

@functioner after review code, comment from Yiqun seems not resolved yet, do you mind to make thread pool size be configurable?

Apr 15 '21 04:04 Hexiaoqiao

@Hexiaoqiao thanks for the reminder. I've added:

comments for removing logSyncNotifyExecutor.shutdown()
configuration for the size of logSyncNotifyExecutor
the docs for that configuration

Apr 15 '21 05:04 functioner

:broken_heart: -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 :ok:	reexec	1m 0s		Docker mode activated.
			_ Prechecks _
+1 :green_heart:	dupname	0m 1s		No case conflicting files found.
+0 :ok:	codespell	0m 0s		codespell was not available.
+1 :green_heart:	@author	0m 0s		The patch does not contain any @author tags.
-1 :x:	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 :green_heart:	mvninstall	34m 44s		trunk passed
+1 :green_heart:	compile	1m 23s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	compile	1m 15s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	checkstyle	1m 1s		trunk passed
+1 :green_heart:	mvnsite	1m 21s		trunk passed
+1 :green_heart:	javadoc	0m 55s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javadoc	1m 26s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	spotbugs	3m 11s		trunk passed
+1 :green_heart:	shadedclient	16m 57s		branch has no errors when building and testing our client artifacts.
-0 :warning:	patch	17m 15s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 :green_heart:	mvninstall	1m 13s		the patch passed
+1 :green_heart:	compile	1m 21s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javac	1m 21s		the patch passed
+1 :green_heart:	compile	1m 9s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	javac	1m 9s		the patch passed
+1 :green_heart:	blanks	0m 0s		The patch has no blanks issues.
-0 :warning:	checkstyle	0m 58s	/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 463 unchanged - 0 fixed = 466 total (was 463)
+1 :green_heart:	mvnsite	1m 14s		the patch passed
+1 :green_heart:	xml	0m 1s		The patch has no ill-formed XML file.
+1 :green_heart:	javadoc	0m 47s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javadoc	1m 17s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	spotbugs	3m 22s		the patch passed
+1 :green_heart:	shadedclient	17m 12s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 :x:	unit	417m 5s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 :green_heart:	asflicense	0m 41s		The patch does not generate ASF License warnings.
		506m 48s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.namenode.TestDecommissioningStatus
	hadoop.hdfs.TestViewDistributedFileSystemContract
	hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
	hadoop.hdfs.TestPersistBlocks
	hadoop.hdfs.TestDFSShell
	hadoop.hdfs.TestLeaseRecovery
	hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
	hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme
	hadoop.hdfs.TestStateAlignmentContextWithHA
	hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
	hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS
	hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
	hadoop.hdfs.TestLeaseRecovery2
	hadoop.hdfs.server.namenode.ha.TestEditLogTailer
	hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
	hadoop.hdfs.server.datanode.TestBlockScanner
	hadoop.hdfs.server.datanode.TestIncrementalBrVariations
	hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
	hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
	hadoop.hdfs.server.datanode.TestDirectoryScanner
	hadoop.hdfs.TestHDFSFileSystemContract
	hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
	hadoop.hdfs.web.TestWebHdfsFileSystemContract

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/4/artifact/out/Dockerfile
GITHUB PR	https://github.com/apache/hadoop/pull/2737
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux da07e6c90844 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / b751835d361fbab6a03ca6827c1cf02626f33b04
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/4/testReport/
Max. process+thread count	3148 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/4/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Apr 15 '21 13:04 hadoop-yetus

:broken_heart: -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 :ok:	reexec	1m 0s		Docker mode activated.
			_ Prechecks _
+1 :green_heart:	dupname	0m 0s		No case conflicting files found.
+0 :ok:	codespell	0m 1s		codespell was not available.
+1 :green_heart:	@author	0m 0s		The patch does not contain any @author tags.
-1 :x:	test4tests	0m 0s		The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
			_ trunk Compile Tests _
+1 :green_heart:	mvninstall	33m 57s		trunk passed
+1 :green_heart:	compile	1m 19s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	compile	1m 14s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	checkstyle	1m 3s		trunk passed
+1 :green_heart:	mvnsite	1m 20s		trunk passed
+1 :green_heart:	javadoc	0m 54s		trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javadoc	1m 23s		trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	spotbugs	3m 5s		trunk passed
+1 :green_heart:	shadedclient	16m 2s		branch has no errors when building and testing our client artifacts.
-0 :warning:	patch	16m 20s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 :green_heart:	mvninstall	1m 9s		the patch passed
+1 :green_heart:	compile	1m 11s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javac	1m 11s		the patch passed
+1 :green_heart:	compile	1m 5s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	javac	1m 5s		the patch passed
+1 :green_heart:	blanks	0m 0s		The patch has no blanks issues.
-0 :warning:	checkstyle	0m 54s	/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 463 unchanged - 0 fixed = 466 total (was 463)
+1 :green_heart:	mvnsite	1m 18s		the patch passed
+1 :green_heart:	xml	0m 2s		The patch has no ill-formed XML file.
+1 :green_heart:	javadoc	0m 48s		the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
+1 :green_heart:	javadoc	1m 16s		the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
+1 :green_heart:	spotbugs	3m 10s		the patch passed
+1 :green_heart:	shadedclient	16m 2s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
-1 :x:	unit	458m 45s	/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt	hadoop-hdfs in the patch passed.
+1 :green_heart:	asflicense	0m 45s		The patch does not generate ASF License warnings.
		545m 26s

Reason	Tests
Failed junit tests	hadoop.hdfs.server.namenode.TestDecommissioningStatus
	hadoop.hdfs.TestViewDistributedFileSystemContract
	hadoop.hdfs.TestPersistBlocks
	hadoop.hdfs.TestDFSShell
	hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList
	hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme
	hadoop.hdfs.server.blockmanagement.TestAvailableSpaceRackFaultTolerantBPP
	hadoop.hdfs.TestStateAlignmentContextWithHA
	hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeHdfsFileSystemContract
	hadoop.hdfs.TestViewDistributedFileSystemWithMountLinks
	hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
	hadoop.fs.TestSymlinkHdfsFileSystem
	hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS
	hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor
	hadoop.hdfs.server.namenode.ha.TestEditLogTailer
	hadoop.hdfs.server.datanode.TestBlockScanner
	hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand
	hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
	hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys
	hadoop.hdfs.server.datanode.TestDirectoryScanner
	hadoop.hdfs.TestHDFSFileSystemContract
	hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
	hadoop.hdfs.web.TestWebHdfsFileSystemContract

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/5/artifact/out/Dockerfile
GITHUB PR	https://github.com/apache/hadoop/pull/2737
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml
uname	Linux 5028b1276e33 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / 3c57512d104e3a92391c9a03ce4005a00267c07f
Default Java	Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/5/testReport/
Max. process+thread count	3044 (vs. ulimit of 5500)
modules	C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2737/5/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Apr 16 '21 00:04 hadoop-yetus

@functioner would you mind to submit addendum blank commit to trigger CI again. Just notice that there are many failed unit tests but it seems not related to this PR(not very sure), Let's wait what Yetus says. @linyiqun @daryn-sharp @amahussein Do you mind give another reviews here? if no more other comments. I would like to commit it shortly.

Apr 17 '21 10:04 Hexiaoqiao

@functioner would you mind to submit addendum blank commit to trigger CI again. Just notice that there are many failed unit tests but it seems not related to this PR(not very sure), Let's wait what Yetus says. @linyiqun @daryn-sharp @amahussein Do you mind give another reviews here? if no more other comments. I would like to commit it shortly.

@functioner @Hexiaoqiao , I have another comment for logSyncNotifyExecutor close, please see my above comment.

Apr 17 '21 11:04 linyiqun

The discussion on PR #2878 and HDFS-15957 sounds to be around the same concept.

By definition, if there is a "network issue", the NameNode in general because all RPCs will be problematic. So what makes that `call.sendResponse(); call get a special treatment compared RPC in the NameNode? I do not see a bug.

Network issues are not fixed by delegating the notifications to a pool of thread. This hides the symptoms of the problems and makes it way difficult to the detect the original problem.
This code is potentially adding a major overhead because of the lambda expression that implies triggering a sequence of object allocations inside each iteration. "More allocations" lead to: slow executions, means more GC, more frequent Full-GC, ..etc.

-1 on this patch.

Apr 18 '21 17:04 amahussein

@amahussein Thanks for the comment. Can I send an email to you to explain more about the issue? @Hexiaoqiao and I have some more discussion on it, and some discussion is inconvenient to put in here. You can contact me via [email protected] or [email protected] and then I will reply.

Apr 18 '21 20:04 functioner

@amahussein Thanks for your quick response. I think this is not the same concept/issue between HDFS-15957 and HDFS-15869. I have leave comment at HDFS-15869 and suggest to change it to improvement rather than bug fix; (cc @functioner) IMO, this is a classic Producer-Consumer problem, and it is natural idea to improve performance using parallel way. And Yiqun has reported the same issue at HDFS-15486. In my own production env it was about 5% E2E saving time for write operation. My suggestion, A. Update description for improvement rather than bug fix. B. If any concerns about lambda expression, we could improve it rather than reject it directly. Welcome any more discussion. Thanks everyone here.

Apr 19 '21 04:04 Hexiaoqiao

@amahussein Thanks for your quick response. I think this is not the same concept/issue between HDFS-15957 and HDFS-15869. I have leave comment at HDFS-15869 and suggest to change it to improvement rather than bug fix; (cc @functioner) IMO, this is a classic Producer-Consumer problem, and it is natural idea to improve performance using parallel way. And Yiqun has reported the same issue at HDFS-15486. In my own production env it was about 5% E2E saving time for write operation. My suggestion, A. Update description for improvement rather than bug fix. B. If any concerns about lambda expression, we could improve it rather than reject it directly. Welcome any more discussion. Thanks everyone here.

Thanks @Hexiaoqiao Hexiaoqiao for the comment. Ok, is the purpose of the change is to improve performance of the FSEditLogAsync.java by executing sendResponse() in parallel? In that case, please change the title of the Jira and the description to remove references to "hanging" problems. Then I will take another look. I am sorry for the inconvenience as I want to make sure I understand the purpose of the change before revieweing.

@amahussein Thanks for the comment. Can I send an email to you to explain more about the issue? @Hexiaoqiao and I have some more discussion on it, and some discussion is inconvenient to put in here. You can contact me via [email protected] or [email protected] and then I will reply.

Thanks @functioner ! I really appreciate that. I think @Hexiaoqiao Hexiaoqiao reply already clarified some of the confusion about the scope of the work. Please feel free to reach me through email at anytime. I am on the common-dev mailing list.

Apr 19 '21 14:04 amahussein

If any concerns about lambda expression, we could improve it rather than reject it directly.

@amahussein A common way to eliminate such overhead is preparing multiple consumer threads, and feed them with requests. If the lambda expressions cause significant overhead, we can improve in that way. This design pattern is widely used in Cassandra. Example: SEPWorker - SEPExecutor https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPWorker.java https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java

Apr 19 '21 15:04 functioner

In that case, please change the title of the Jira and the description to remove references to "hanging" problems.

@amahussein I still would like to argue about this "hanging" issue.

There has been reported TCP network I/O issues which hangs for >15min without throwing any exception. ZOOKEEPER-2201 is a perfect example, and you can find the TCP level explanation for this hanging issue in https://www.usenix.org/conference/srecon16/program/presentation/nadolny Similar hanging bugs are also accepted by ZooKeeper community, such as:

ZOOKEEPER-3531: very similar to ZK-2201; the patch is merged
ZOOKEEPER-4074: a similar network hanging bug I reported; already confirmed by community; more discussion can be found in https://github.com/apache/zookeeper/pull/1582

However, in our scenario (HDFS-15869), a possible counterargument is: the call.sendResponse() invocation eventually invokes channel.write(buffer) (line 3611), which is non-blocking mode, so it might not be affected by this potential issue. https://github.com/apache/hadoop/blob/3c57512d104e3a92391c9a03ce4005a00267c07f/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L3607-L3616 However, as we point out in HDFS-15869, when the payload is huge, in line 3611, it won't invoke channel.write(buffer); instead, it invokes channelIO(null, channel, buffer) which brings us to: https://github.com/apache/hadoop/blob/3c57512d104e3a92391c9a03ce4005a00267c07f/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L3646-L3672 If the payload is split in two batches, the second batch will have to wait for the first batch to be sent out, which may encounter high packet loss rate and thus slow I/O.

Hence, I would say the hanging problem still exists.

Apr 19 '21 16:04 functioner

In that case, please change the title of the Jira and the description to remove references to "hanging" problems.

@amahussein I still would like to argue about this "hanging" issue.

Another aspect of the argument is the design of availability and fault tolerance. Actually distributed systems can tolerate such hanging issues in many scenarios, but sometimes it's seen as a bug like ZOOKEEPER-2201.

So an important question is: when it's a bug; and when it's not (i.e., it's a feature)

I've been doing research on fault injection for some time and I have submitted multiple bug reports accepted by the open source community (e.g., HADOOP-17552). My criteria for determining whether it is bug, are:

if we inject a fault in module X and it only affects module X, then it’s not a bug.
if we inject a fault in module X and it affects not only module X but also module Y which should not relate to module X, then probably it would be a bug, because in the system design, each module should be responsible for itself and report the problem (e.g., by logging), rather than affect another irrelevant module.

In our scenario (HDFS-15869), this possible hanging (if you agree with my argument of network hanging) can block the FSEditLogAsync thread, because now call.sendResponse() is invoked by the FSEditLogAsync thread.

So, call.sendResponse() (network service) affects FSEditLogAsync (edit log sync service). So, I would say it's a bug.

The network service should be responsible for all its behaviors, and handle all the possible network issues (e.g., IOException, disconnection, hanging). It should determine how to handle them, e.g., by logging the error, rather than affecting other services like FSEditLogAsync.

I'm not saying that we have to use a complete and slow RPC framework for this network service. But IMO, decoupling it from FSEditLogAsync by delegating to a thread pool is at least a better design.

Apr 19 '21 16:04 functioner

Thanks @functioner The detailed discussions (except the lambda argument) should have been on the Jira.

IMO, this is a classic Producer-Consumer problem, and it is natural idea to improve performance using parallel way.

So, call.sendResponse() (network service) affects FSEditLogAsync (edit log sync service). So, I would say it's a bug.

Now, I am really even more confused about the (Bug Vs. Improvement). So, I am going to pass on reviewing.

If any concerns about lambda expression, we could improve it rather than reject it directly.

@amahussein A common way to eliminate such overhead is preparing multiple consumer threads, and feed them with requests. If the lambda expressions cause significant overhead, we can improve in that way. This design pattern is widely used in Cassandra. Example: SEPWorker - SEPExecutor https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPWorker.java https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java

This is not what I meant. It is recommended to avoid use of lambda expressions in hot execution paths. There are so many ways to avoid lambda expressions simply by having runnables waiting for tasks to be added to a queue.

Apr 19 '21 17:04 amahussein

Thanks @functioner The detailed discussions (except the lambda argument) should have been on the Jira.

I see. I will make a comment of summary in Jira after the discussion in this PR is finalized.

If any concerns about lambda expression, we could improve it rather than reject it directly.

@amahussein A common way to eliminate such overhead is preparing multiple consumer threads, and feed them with requests. If the lambda expressions cause significant overhead, we can improve in that way. This design pattern is widely used in Cassandra. Example: SEPWorker - SEPExecutor https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPWorker.java https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java

This is not what I meant. It is recommended to avoid use of lambda expressions in hot execution paths.

Actually we are on the same page. Maybe my comment has made some confusion.

There are so many ways to avoid lambda expressions simply by having runnables waiting for tasks to be added to a queue.

That's exactly what I meant. I will push a commit soon.

Apr 19 '21 17:04 functioner

Thanks @functioner The detailed discussions (except the lambda argument) should have been on the Jira.

IMO, this is a classic Producer-Consumer problem, and it is natural idea to improve performance using parallel way.

So, call.sendResponse() (network service) affects FSEditLogAsync (edit log sync service). So, I would say it's a bug.

Now, I am really even more confused about the (Bug Vs. Improvement). So, I am going to pass on reviewing.

@amahussein Thanks for your feedback, and your time! Sorry for all the possible confusion I made.

It's not a big deal whether it's marked as bug or improvement. One of my bug reports (HADOOP-17552) is also finally marked as improvement rather than bug. The point is that the developers (in HADOOP-17552) finally realized that there's a potential hanging issue as I point out, and the patch (as well as the relevant discussion) is very helpful for the developers and the users.

Ok, is the purpose of the change is to improve performance of the FSEditLogAsync.java by executing sendResponse() in parallel? In that case, please change the title of the Jira and the description to remove references to "hanging" problems.

My intention is to defend that we should not remove the references to "hanging" problems.

In short, the discussion above can be summarized into 3 arguments:

https://github.com/apache/hadoop/pull/2737#issuecomment-822151838: this is a classic Producer-Consumer problem, and it is natural idea to improve performance using parallel way
https://github.com/apache/hadoop/pull/2737#issuecomment-822591028: the call.sendResponse() may hang due to network issue, without throwing any exception
https://github.com/apache/hadoop/pull/2737#issuecomment-822617097: a). the FSEditLogAsync thread (without this patch) directly invokes a network I/O call call.sendResponse(), so if this network I/O invocation hangs, the FSEditLogAsync thread also hangs b). in the "correct" system design, if this network I/O invocation hangs in this way, then that should be fine, because HDFS (as a fault-tolerant system) should tolerate it. c). when the system tolerates this network issue, the FSEditLogAsync thread should not hang, otherwise everybody can't commit the log. d). our expected behavior is that, when the system tolerates this network issue, the FSEditLogAsync thread should continue, so that everything still works well, despite this network issue.

Both Argument 3 and Argument 1 can be resolved with this patch.

In conclusion, this patch not only improves the performance, but also enhances the availability & fault-tolerance.

So, I think the references to "hanging" problems should not be removed.

If it keeps "Improvement" tag instead of "Bug" tag, that's fine.

P.S. I will summarize our discussion with a comment in Jira after we reach a consensus.

Apr 19 '21 19:04 functioner

the FSEditLogAsync thread (without this patch) directly invokes a network I/O call call.sendResponse(), so if this network I/O invocation hangs, the FSEditLogAsync thread also hangs My intention is to defend that we should not remove the references to "hanging" problems.

No, it's a non-blocking write so by definition it will never hang – unless induced by fault injection.

Apr 19 '21 21:04 daryn-sharp

No, it's a non-blocking write so by definition it will never hang – unless induced by fault injection.

@daryn-sharp I have considered this counterargument in https://github.com/apache/hadoop/pull/2737#issuecomment-822591028, where I proposed another argument that it may hang when it's a huge payload, because there's a while-loop. Please take a look.

Yes, channel.write(buffer) is non-blocking. But I suspect that channelIO(null, channel, buffer) is blocking, otherwise we won't have the performance issue in HDFS-15486. Within channelIO(null, channel, buffer), the large payload is split into multiple parts, and it will won't jump out of the loop until the remaining part of payload does not exceed the buffer limit, meaning that it's waiting for the network to finish sending some content.

I'm not sure whether it can defend my argument. Can you provide more explanation? Maybe I'm not correct. Thanks!

Apr 19 '21 21:04 functioner

@functioner Can you move forward with this MR? I can help with review. If you don't have time, I can resign this ticket to other contributor.

Jan 17 '24 11:01 ZanderXu

@functioner Can you move forward with this MR? I can help with review. If you don't have time, I can resign this ticket to other contributor.

@ZanderXu According to the discussion so far, Daryn has some doubt on the issue and patch. Do you have any idea to move forward the case?

Jan 27 '24 01:01 functioner

Thanks for your works and discussions for this problem. I spent a long time to catch your ideas and concerns😂, it's so hard.

I have some throughs and questions about this ticket.

Some questions in HDFS-15869:

The channel.write(buffer) operation in line 3594 may be slow. Although for this specific stack trace, the channel is initialized in the non-blocking mode, there is still a chance of being slow depending on native write implementation in the OS (e.g., a kernel issue). Furthermore, the channelIO invocation in line 3594 may also get stuck, since it waits until the buffer is drained:

ret = (readCh == null) ? writeCh.write(buf) : readCh.read(buf); will return 0 if namenode cannot write more data to this connection, right? RpcEdit,logSyncNotify will add this response into the queue of this connection and let the Responder to take this job, right? So FSEditLogAsync can go to process the next jobs, right?

Some throughs in HDFS-15869:

Actually, I encountered this problems in our prod environment that the thread FSEditLogAsync spends a little more time to send responses to clients, which had a big performance impact on writing edits to JNs. So I just use a new single thread to do these send response jobs. Of course, we can use multiple threads to send responses to client. New task is very expensive, so we use a producer-consumer mode to fix this problem.

FSEditLogAsync just put task into a capacity blocking Queue.
ResponseSender thread take tasks from the Queue and send them to clients.

About "Bug" or "Improvement", I think it should be a performance improvement, since all processes are worked as expected, no blocking or hanging, just slow.

Some throughs in HDFS-15957:

I think namenode should directly close this connection if IOException happens in RpcEdit,logSyncNotify, since we cannot let the client hang forever. It seems that the namenode drops a request.

@functioner Looking forward your ideas and confirm.

@daryn-sharp @Hexiaoqiao @linyiqun @amahussein Looking forward your ideas. I hope we can push this ticket forward.

Feb 23 '24 07:02 ZanderXu

If I missed some other concerns, please let me know, we can find solutions together to push this ticket forward.

Feb 23 '24 07:02 ZanderXu

hadoop hadoop copied to clipboard

HDFS-15869. Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

hadoop
hadoop copied to clipboard