hadoop
hadoop copied to clipboard
MAPREDUCE-7465. Add support for parallelism in FileOutputCommiter via 'mapreduce.fileoutputcommitter.parallel.threshold'
see https://issues.apache.org/jira/browse/MAPREDUCE-7465
when commiting a big hadoop job (for example via Spark) having many partitions, the class FileOutputCommiter process thousands of dirs/files to rename with a single Thread. This is performance issue, caused by lot of waits on FileStystem storage operations.
I propose that above a configurable threshold (default=3, configurable via property 'mapreduce.fileoutputcommitter.parallel.threshold'), the class FileOutputCommiter process the list of files to rename using parallel threads, using the default jvm ExecutorService (ForkJoinPool.commonPool())
:broken_heart: -1 overall
| Vote | Subsystem | Runtime | Logfile | Comment |
|---|---|---|---|---|
| +0 :ok: | reexec | 0m 20s | Docker mode activated. | |
| _ Prechecks _ | ||||
| +1 :green_heart: | dupname | 0m 0s | No case conflicting files found. | |
| +0 :ok: | codespell | 0m 0s | codespell was not available. | |
| +0 :ok: | detsecrets | 0m 0s | detect-secrets was not available. | |
| +1 :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | |
| -1 :x: | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | |
| _ trunk Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 31m 41s | trunk passed | |
| +1 :green_heart: | compile | 0m 23s | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | compile | 0m 20s | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | |
| +1 :green_heart: | checkstyle | 0m 21s | trunk passed | |
| +1 :green_heart: | mvnsite | 0m 28s | trunk passed | |
| +1 :green_heart: | javadoc | 0m 21s | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javadoc | 0m 18s | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | |
| +1 :green_heart: | spotbugs | 0m 53s | trunk passed | |
| +1 :green_heart: | shadedclient | 19m 33s | branch has no errors when building and testing our client artifacts. | |
| _ Patch Compile Tests _ | ||||
| +1 :green_heart: | mvninstall | 0m 17s | the patch passed | |
| +1 :green_heart: | compile | 0m 18s | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javac | 0m 18s | the patch passed | |
| +1 :green_heart: | compile | 0m 18s | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | |
| +1 :green_heart: | javac | 0m 18s | the patch passed | |
| +1 :green_heart: | blanks | 0m 0s | The patch has no blanks issues. | |
| -0 :warning: | checkstyle | 0m 15s | /results-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt | hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: The patch generated 18 new + 15 unchanged - 0 fixed = 33 total (was 15) |
| +1 :green_heart: | mvnsite | 0m 18s | the patch passed | |
| +1 :green_heart: | javadoc | 0m 13s | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | |
| +1 :green_heart: | javadoc | 0m 14s | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | |
| +1 :green_heart: | spotbugs | 0m 52s | the patch passed | |
| +1 :green_heart: | shadedclient | 19m 30s | patch has no errors when building and testing our client artifacts. | |
| _ Other Tests _ | ||||
| +1 :green_heart: | unit | 5m 22s | hadoop-mapreduce-client-core in the patch passed. | |
| +1 :green_heart: | asflicense | 0m 23s | The patch does not generate ASF License warnings. | |
| 84m 20s |
| Subsystem | Report/Notes |
|---|---|
| Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6378/1/artifact/out/Dockerfile |
| GITHUB PR | https://github.com/apache/hadoop/pull/6378 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux 54d489c0a80f 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / 0b18b3bedb9269bc7299cff42499354b95d61314 |
| Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
| Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6378/1/testReport/ |
| Max. process+thread count | 1648 (vs. ulimit of 5500) |
| modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core |
| Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6378/1/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
@Arnaud-Nauwynck just stuck up #6399 which is rajesh's impl with my reviews in too. I'm not going to merge that into hadoop either because it's still got so many problems on cloud storage, especially abfs throttling. best to embrace the manifest committer and complain if you hit problems.
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you feel like this was a mistake, or you would like to continue working on it, please feel free to re-open it and ask for a committer to remove the stale tag and review again. Thanks all for your contribution.