hadoop icon indicating copy to clipboard operation
hadoop copied to clipboard

MAPREDUCE-7470: multi-thread mapreduce committer

Open lastbus opened this issue 1 year ago • 3 comments

Description of PR

In cloud environment, such as aws, aliyun etc., the internet delay is non-trival when we commit thounds of files.

In our situation, the ping delay is about 0.03ms in IDC, but when move to Coud, the ping delay is about 3ms, which is roughly 100x slower. We found that, committing tens thounds of files will cost a few tens of minutes. The more files there are, the logger it takes.

So we propose a new committer algorithm, which is a variant of committer algorithm version 1, called 3. In this new algorithm 3, in order to decrease the committer time, we use a thread pool to commit job's final output.

Our test result in Cloud production shows that, the new algorithm 3 has decrease the committer time by serveral tens of times.

How was this patch tested?

For code changes:

lastbus avatar Jan 19 '24 14:01 lastbus

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 19s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
-1 :x: test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 31m 48s trunk passed
+1 :green_heart: compile 0m 22s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 0m 19s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: checkstyle 0m 23s trunk passed
+1 :green_heart: mvnsite 0m 27s trunk passed
+1 :green_heart: javadoc 0m 22s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 0m 18s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 0m 54s trunk passed
+1 :green_heart: shadedclient 19m 34s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 0m 19s the patch passed
+1 :green_heart: compile 0m 19s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 0m 19s the patch passed
+1 :green_heart: compile 0m 17s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: javac 0m 17s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
-0 :warning: checkstyle 0m 16s /results-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: The patch generated 1 new + 16 unchanged - 0 fixed = 17 total (was 16)
+1 :green_heart: mvnsite 0m 22s the patch passed
+1 :green_heart: javadoc 0m 13s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 0m 14s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 0m 50s the patch passed
+1 :green_heart: shadedclient 19m 24s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 :x: unit 5m 25s /patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt hadoop-mapreduce-client-core in the patch passed.
+1 :green_heart: asflicense 0m 23s The patch does not generate ASF License warnings.
84m 35s
Reason Tests
Failed junit tests hadoop.mapreduce.lib.output.TestFileOutputCommitter
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6469/1/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6469
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 2ac0aa160159 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ca1b7f9e6bf7e6bc06b1ba01aef25aa0d3ba1a04
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6469/1/testReport/
Max. process+thread count 1474 (vs. ulimit of 5500)
modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6469/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Jan 19 '24 16:01 hadoop-yetus

@lastbus Thanks for the contribution! we need to fix the checkstyle issue.

slfan1989 avatar Jan 21 '24 00:01 slfan1989

Like I said on the jira, I don't want this. It has the same scale issues encountered on abfs as #6399 and #6378, the same correctness problems on GCS as v2, as in "incorrect task commit semantics" unless v1 commit can made to not rely on atomic directory rename, but instead "atomic file rename", which does work there.

  • which cloud store have you tested this against? Does it actually have the semantics of rename for v1 task commit?
  • what was the depth/width of the directory structure?
  • did you try a terasort?
  • did you try multiple jobs through spark at the same time? as there memory is a problem: #5728

Even if the store meets the v1 correctness pre-requisites I would like to see a comparison of the same job you have tested through the manifest committer. Ideally with any profiling to highlight where it could be improved.

steveloughran avatar Feb 01 '24 10:02 steveloughran

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 :green_heart: dupname 0m 00s No case conflicting files found.
+0 :ok: spotbugs 0m 00s spotbugs executables are not available.
+0 :ok: codespell 0m 00s codespell was not available.
+0 :ok: detsecrets 0m 00s detect-secrets was not available.
+1 :green_heart: @author 0m 01s The patch does not contain any @author tags.
-1 :x: test4tests 0m 00s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 88m 55s trunk passed
+1 :green_heart: compile 5m 06s trunk passed
+1 :green_heart: checkstyle 4m 30s trunk passed
+1 :green_heart: mvnsite 4m 59s trunk passed
+1 :green_heart: javadoc 4m 29s trunk passed
+1 :green_heart: shadedclient 143m 04s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 3m 16s the patch passed
+1 :green_heart: compile 2m 18s the patch passed
+1 :green_heart: javac 2m 18s the patch passed
+1 :green_heart: blanks 0m 01s The patch has no blanks issues.
+1 :green_heart: checkstyle 2m 03s the patch passed
+1 :green_heart: mvnsite 2m 28s the patch passed
+1 :green_heart: javadoc 2m 02s the patch passed
+1 :green_heart: shadedclient 155m 09s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: asflicense 5m 16s The patch does not generate ASF License warnings.
410m 29s
Subsystem Report/Notes
GITHUB PR https://github.com/apache/hadoop/pull/6469
JIRA Issue MAPREDUCE-7470
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname MINGW64_NT-10.0-17763 153407ddae7f 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / ca1b7f9e6bf7e6bc06b1ba01aef25aa0d3ba1a04
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6469/1/testReport/
modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6469/1/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Apr 26 '24 03:04 hadoop-yetus