hadoop icon indicating copy to clipboard operation
hadoop copied to clipboard

HADOOP-19072. S3A: expand optimisations on stores with "fs.s3a.create.performance"

Open virajjasani opened this issue 1 year ago • 26 comments

Jira: HADOOP-19072

virajjasani avatar Feb 09 '24 01:02 virajjasani

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 20s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+0 :ok: markdownlint 0m 0s markdownlint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 :ok: mvndep 14m 1s Maven dependency ordering for branch
+1 :green_heart: mvninstall 19m 19s trunk passed
+1 :green_heart: compile 8m 18s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 7m 31s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: checkstyle 1m 59s trunk passed
+1 :green_heart: mvnsite 1m 29s trunk passed
+1 :green_heart: javadoc 1m 6s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 1s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
-1 :x: spotbugs 1m 24s /branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs warnings.
+1 :green_heart: shadedclient 19m 55s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 20s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 45s the patch passed
+1 :green_heart: compile 7m 53s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 7m 53s the patch passed
+1 :green_heart: compile 7m 40s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: javac 7m 40s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 1m 56s the patch passed
+1 :green_heart: mvnsite 1m 24s the patch passed
+1 :green_heart: javadoc 1m 1s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 1s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 2m 22s the patch passed
+1 :green_heart: shadedclient 19m 56s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 16m 34s hadoop-common in the patch passed.
+1 :green_heart: unit 2m 25s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 39s The patch does not generate ASF License warnings.
144m 55s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/1/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6543
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux d9248b10a158 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 2728303b77240d3e7bfe38f0b33d3f325c34ac37
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/1/testReport/
Max. process+thread count 2153 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Feb 09 '24 04:02 hadoop-yetus

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 21s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 1s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+0 :ok: markdownlint 0m 1s markdownlint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 :ok: mvndep 14m 29s Maven dependency ordering for branch
+1 :green_heart: mvninstall 19m 53s trunk passed
+1 :green_heart: compile 8m 20s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 7m 38s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: checkstyle 2m 4s trunk passed
+1 :green_heart: mvnsite 1m 28s trunk passed
+1 :green_heart: javadoc 1m 6s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 1s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
-1 :x: spotbugs 1m 26s /branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs warnings.
+1 :green_heart: shadedclient 19m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 21s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 47s the patch passed
+1 :green_heart: compile 8m 1s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 8m 1s the patch passed
+1 :green_heart: compile 7m 30s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: javac 7m 30s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 2m 0s the patch passed
+1 :green_heart: mvnsite 1m 22s the patch passed
+1 :green_heart: javadoc 0m 57s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 1s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 2m 21s the patch passed
+1 :green_heart: shadedclient 19m 48s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 16m 41s hadoop-common in the patch passed.
+1 :green_heart: unit 2m 26s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 39s The patch does not generate ASF License warnings.
145m 53s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/2/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6543
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux b01c9384f3da 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3874f72bc9e7a186e03a64cc777f0235b8029f02
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/2/testReport/
Max. process+thread count 3152 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Feb 09 '24 06:02 hadoop-yetus

Tested against us-west-2:

mvn clean verify -Dparallel-tests -DtestsThreadCount=8 -Dscale -Dprefetch

virajjasani avatar Feb 14 '24 07:02 virajjasani

@shameersss1 what do you think here? actually, maybe under magic paths we skip trying to create dirs at all, at least on the in-memory mode. no files to look for after all so all that happens is a dir tree is needlessly created, and the HEAD requests I'm proposing wouldn't even find any conflict with files that don't exist

steveloughran avatar Feb 15 '24 13:02 steveloughran

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 22s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+0 :ok: markdownlint 0m 0s markdownlint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 :ok: mvndep 14m 4s Maven dependency ordering for branch
+1 :green_heart: mvninstall 21m 16s trunk passed
+1 :green_heart: compile 8m 48s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 7m 59s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: checkstyle 2m 0s trunk passed
+1 :green_heart: mvnsite 1m 24s trunk passed
+1 :green_heart: javadoc 1m 2s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 2s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
-1 :x: spotbugs 1m 26s /branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs warnings.
+1 :green_heart: shadedclient 19m 55s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 20s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 49s the patch passed
+1 :green_heart: compile 8m 18s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 8m 18s the patch passed
+1 :green_heart: compile 7m 45s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: javac 7m 45s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
-0 :warning: checkstyle 2m 4s /results-checkstyle-root.txt root: The patch generated 2 new + 15 unchanged - 0 fixed = 17 total (was 15)
+1 :green_heart: mvnsite 1m 22s the patch passed
+1 :green_heart: javadoc 0m 58s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 2s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 2m 25s the patch passed
+1 :green_heart: shadedclient 19m 56s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 17m 12s hadoop-common in the patch passed.
+1 :green_heart: unit 2m 16s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 34s The patch does not generate ASF License warnings.
148m 44s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/3/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6543
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux d3062a6e7ed1 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 12e6bff73832fd72a82ad43667aa531b4976d1ab
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/3/testReport/
Max. process+thread count 3151 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Feb 16 '24 10:02 hadoop-yetus

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 21s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+0 :ok: markdownlint 0m 0s markdownlint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 :ok: mvndep 13m 59s Maven dependency ordering for branch
+1 :green_heart: mvninstall 20m 18s trunk passed
+1 :green_heart: compile 8m 18s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 7m 32s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: checkstyle 2m 1s trunk passed
+1 :green_heart: mvnsite 1m 24s trunk passed
+1 :green_heart: javadoc 1m 3s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 0m 56s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
-1 :x: spotbugs 1m 24s /branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs warnings.
+1 :green_heart: shadedclient 19m 24s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 20s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 43s the patch passed
+1 :green_heart: compile 8m 3s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 8m 3s the patch passed
+1 :green_heart: compile 7m 35s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: javac 7m 35s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 1m 56s the patch passed
+1 :green_heart: mvnsite 1m 19s the patch passed
+1 :green_heart: javadoc 0m 55s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 0m 59s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 2m 19s the patch passed
+1 :green_heart: shadedclient 19m 37s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 16m 50s hadoop-common in the patch passed.
+1 :green_heart: unit 2m 20s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 33s The patch does not generate ASF License warnings.
144m 26s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/4/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6543
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux f3c9795ec06a 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ff8a9d2e9dab36887a266a0eb00d6268c4331a1c
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/4/testReport/
Max. process+thread count 1282 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Feb 16 '24 10:02 hadoop-yetus

@shameersss1 what do you think here? actually, maybe under magic paths we skip trying to create dirs at all, at least on the in-memory mode. no files to look for after all so all that happens is a dir tree is needlessly created, and the HEAD requests I'm proposing wouldn't even find any conflict with files that don't exist

@steveloughran In the proposed solution (https://issues.apache.org/jira/browse/HADOOP-19047), Even in the in-memory mode, The taskAttempt will write a (.pendingset) file containing the metadata of multi-part-upload (MPU) inside the magic path which will be read by the driver process and Hence the directory creation is necessary.

shameersss1 avatar Feb 19 '24 05:02 shameersss1

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 23s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+0 :ok: markdownlint 0m 0s markdownlint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 :ok: mvndep 14m 16s Maven dependency ordering for branch
+1 :green_heart: mvninstall 21m 45s trunk passed
+1 :green_heart: compile 9m 46s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: compile 9m 3s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: checkstyle 2m 10s trunk passed
+1 :green_heart: mvnsite 1m 35s trunk passed
+1 :green_heart: javadoc 1m 9s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javadoc 1m 7s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: spotbugs 2m 17s trunk passed
+1 :green_heart: shadedclient 21m 49s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 21s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 54s the patch passed
+1 :green_heart: compile 8m 23s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javac 8m 23s the patch passed
+1 :green_heart: compile 7m 47s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: javac 7m 47s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 1m 58s the patch passed
+1 :green_heart: mvnsite 1m 25s the patch passed
+1 :green_heart: javadoc 0m 53s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javadoc 1m 2s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: spotbugs 2m 25s the patch passed
+1 :green_heart: shadedclient 20m 0s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 16m 39s hadoop-common in the patch passed.
+1 :green_heart: unit 2m 14s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 35s The patch does not generate ASF License warnings.
154m 16s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/5/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6543
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux 106dcbd22ec9 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ff8a9d2e9dab36887a266a0eb00d6268c4331a1c
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/5/testReport/
Max. process+thread count 1273 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Mar 26 '24 14:03 hadoop-yetus

@steveloughran could you please take another look?

virajjasani avatar Mar 28 '24 16:03 virajjasani

reviewed, i'm just wondering how to make the test the cleanest.

Going to invite reviews from @shameersss1 @ahmarsuhail @HarshitGupta11 @mukund-thakur as they've been looking around here.

Does anyone expect anything to break from this? I don't: we know code doesn't normally try these tricks, otherwise we'd have had complaints about other optimisations.

steveloughran avatar Mar 28 '24 17:03 steveloughran

ok, this is getting over complex.

proposed: copy the superclass code but remove the expectation of failures, retaining only setup and validation.

sounds good, addressed in the latest revision.

virajjasani avatar Apr 02 '24 06:04 virajjasani

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 22s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 1s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+0 :ok: markdownlint 0m 1s markdownlint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 :ok: mvndep 14m 9s Maven dependency ordering for branch
+1 :green_heart: mvninstall 23m 56s trunk passed
+1 :green_heart: compile 8m 47s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: compile 8m 2s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: checkstyle 2m 2s trunk passed
+1 :green_heart: mvnsite 1m 20s trunk passed
+1 :green_heart: javadoc 1m 3s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javadoc 0m 59s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: spotbugs 2m 14s trunk passed
+1 :green_heart: shadedclient 20m 42s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 20s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 50s the patch passed
+1 :green_heart: compile 8m 24s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javac 8m 24s the patch passed
+1 :green_heart: compile 8m 2s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: javac 8m 2s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 1m 55s the patch passed
+1 :green_heart: mvnsite 1m 23s the patch passed
+1 :green_heart: javadoc 0m 56s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javadoc 0m 59s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: spotbugs 2m 24s the patch passed
+1 :green_heart: shadedclient 20m 28s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 16m 36s hadoop-common in the patch passed.
+1 :green_heart: unit 2m 15s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 33s The patch does not generate ASF License warnings.
152m 50s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/6/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6543
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux 2e19c4bcca99 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 8d3012c3e40f3c0575951064ea43fa790debb974
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/6/testReport/
Max. process+thread count 2153 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Apr 02 '24 08:04 hadoop-yetus

But if we're doing this for a whole directory, for all applications, I think that is a bit too risky.

I see your point.

Let me run the whole suite with the latest revision.

virajjasani avatar Apr 04 '24 04:04 virajjasani

Tested against us-west-2, looks good for this change.

Though found a separate issue with scale tests using noaa-cors-pds bucket for my local endpoint/region setup. It's minor issue, not a big deal, will create a Jira later.

virajjasani avatar Apr 04 '24 05:04 virajjasani

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 20s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+0 :ok: markdownlint 0m 0s markdownlint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 :ok: mvndep 18m 37s Maven dependency ordering for branch
+1 :green_heart: mvninstall 20m 51s trunk passed
+1 :green_heart: compile 9m 41s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: compile 9m 4s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: checkstyle 2m 14s trunk passed
+1 :green_heart: mvnsite 1m 19s trunk passed
+1 :green_heart: javadoc 1m 1s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javadoc 0m 55s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: spotbugs 2m 3s trunk passed
+1 :green_heart: shadedclient 21m 27s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 21s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 47s the patch passed
+1 :green_heart: compile 9m 50s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javac 9m 50s the patch passed
+1 :green_heart: compile 8m 54s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: javac 8m 54s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 2m 0s the patch passed
+1 :green_heart: mvnsite 1m 18s the patch passed
+1 :green_heart: javadoc 0m 54s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javadoc 0m 52s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: spotbugs 2m 22s the patch passed
+1 :green_heart: shadedclient 21m 44s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 16m 49s hadoop-common in the patch passed.
+1 :green_heart: unit 2m 12s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 36s The patch does not generate ASF License warnings.
160m 27s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/7/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6543
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux 84529fe21bf3 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / fc306d5e87f9ba64320b9b181f2d4d0f71e678b2
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/7/testReport/
Max. process+thread count 2769 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/7/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Apr 04 '24 07:04 hadoop-yetus

:confetti_ball: +1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 21s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+0 :ok: markdownlint 0m 0s markdownlint was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 0s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 :ok: mvndep 17m 49s Maven dependency ordering for branch
+1 :green_heart: mvninstall 21m 18s trunk passed
+1 :green_heart: compile 9m 58s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: compile 8m 45s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: checkstyle 2m 24s trunk passed
+1 :green_heart: mvnsite 1m 24s trunk passed
+1 :green_heart: javadoc 0m 58s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javadoc 0m 54s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: spotbugs 2m 4s trunk passed
+1 :green_heart: shadedclient 21m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 22s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 48s the patch passed
+1 :green_heart: compile 9m 34s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javac 9m 34s the patch passed
+1 :green_heart: compile 8m 41s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: javac 8m 41s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 2m 11s the patch passed
+1 :green_heart: mvnsite 1m 20s the patch passed
+1 :green_heart: javadoc 0m 52s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 :green_heart: javadoc 0m 55s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 :green_heart: spotbugs 2m 16s the patch passed
+1 :green_heart: shadedclient 21m 51s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 16m 57s hadoop-common in the patch passed.
+1 :green_heart: unit 2m 10s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 32s The patch does not generate ASF License warnings.
160m 31s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/8/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6543
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux 055bbc67034d 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b7e4ede34ba1fb2094ba5363c0ec07cd763e5336
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/8/testReport/
Max. process+thread count 1274 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6543/8/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Apr 04 '24 07:04 hadoop-yetus

Don't think I have forgotten about this -I have just been very distracted.

I'm wondering if we should provide a list of options to "optimise", e.g "create, mkdir, delete" and the specific optimisations those get turned on. Yes, it is suspiciously like those option sets we have for databases. Microsoft are looking at what they can do to speed up IO, such as skipping HEAD requests in open(), making that an option which can be enabled where "things don't break". Having a consistent configuration key and list of optimisations would make a lot of sense.

Thoughts?

steveloughran avatar Apr 09 '24 21:04 steveloughran

I'm wondering if we should provide a list of options to "optimise", e.g "create, mkdir, delete" and the specific optimisations those get turned on. Yes, it is suspiciously like those option sets we have for databases.

I believe this would be good optimization tuning overall, and yes it does remind of database optimizations. In fact, HBase can use these optimizations for HFile creation, deletion etc too (as long as it knowns when and what it is doing). Since we already have fs.s3a.create.performance, how about we file separate jiras for fs.s3a.delete.performance etc new modes?

or you meant, we provide a generic config fs.s3a.optimizations with values including create and/or delete, accordingly we skip HEAD requests?

virajjasani avatar Apr 10 '24 00:04 virajjasani

Don't think I have forgotten about this -I have just been very distracted.

me too, not a problem at all :)

virajjasani avatar Apr 10 '24 00:04 virajjasani

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 :green_heart: dupname 0m 02s No case conflicting files found.
+0 :ok: codespell 0m 02s codespell was not available.
+0 :ok: detsecrets 0m 02s detect-secrets was not available.
+0 :ok: markdownlint 0m 02s markdownlint was not available.
+0 :ok: spotbugs 0m 01s spotbugs executables are not available.
+1 :green_heart: @author 0m 00s The patch does not contain any @author tags.
+1 :green_heart: test4tests 0m 00s The patch appears to include 6 new or modified test files.
_ trunk Compile Tests _
+0 :ok: mvndep 2m 16s Maven dependency ordering for branch
+1 :green_heart: mvninstall 90m 13s trunk passed
+1 :green_heart: compile 39m 39s trunk passed
+1 :green_heart: checkstyle 5m 56s trunk passed
-1 :x: mvnsite 4m 24s /branch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in trunk failed.
+1 :green_heart: javadoc 9m 25s trunk passed
+1 :green_heart: shadedclient 161m 37s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 2m 18s Maven dependency ordering for patch
+1 :green_heart: mvninstall 8m 37s the patch passed
+1 :green_heart: compile 37m 56s the patch passed
+1 :green_heart: javac 37m 56s the patch passed
+1 :green_heart: blanks 0m 00s The patch has no blanks issues.
+1 :green_heart: checkstyle 6m 00s the patch passed
-1 :x: mvnsite 4m 30s /patch-mvnsite-hadoop-common-project_hadoop-common.txt hadoop-common in the patch failed.
+1 :green_heart: javadoc 9m 22s the patch passed
+1 :green_heart: shadedclient 171m 17s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: asflicense 8m 21s The patch does not generate ASF License warnings.
528m 37s
Subsystem Report/Notes
GITHUB PR https://github.com/apache/hadoop/pull/6543
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname MINGW64_NT-10.0-17763 48a1e573d3ea 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / b7e4ede34ba1fb2094ba5363c0ec07cd763e5336
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6543/1/testReport/
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6543/1/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Apr 25 '24 13:04 hadoop-yetus

I'm going to propose we change how the options are done, and do something similar for ABFS. I think we need something like C/C++ optimisers where you pass in a -O list of things to optimise. This lets you turn on everything you can but turn off those which breaks your code.

This avoids us having to deal with regressions where suddenly something breaks -and the only fix is to turn off all optimisation.

It also gives us the ability to add some very aggressive optimisations, such as disabling probing for and recreating parent directories on delete and rename. harshit has tried this and some things break. If we make this one of the flags, those deployments with applications which know they are robust can turn it on.

  • we add specific optimisation flags for different behaviours, which we can explicitly turn on and off.
  • we add a single "fs.s3a.performance.options" which takes a list of these, parses to an S3APerformanceFlags object containing the flags. we can move this parsing out of the s3aFileSystem code into the S3APerformanceFlags class, which assists testing.
  • unknown flags are logged once at info.
  • the performance flags are available from StoreContext.

It would still be good to wire this up to hasPathCapability. Proposed: S3APerformanceFlags adds a hasCapability(string) method, and if s3aFS.hasPathCapability is probed for a feature begininning "fs.s3a.performance.options." then the rest of the string passed in for the check.

The create.performance flag now complicates things here. Too bad we have already shipped it. Proposed: S3APerformanceFlags also looks for that flag value, but also adds "create" as one of the options

that is

fs.s3a.performance.options=create,mkdirs

would cover both that and this new change.

I think we can and should lay down a policy here then which is "the semantics of a specific optimisation flag MUST NOT CHANGE" but that new flags MAY be added to tune that behaviour further.

the most radical would be if we copy the presto connector and declare that all paths which don't return a file or a non-empty listing are a directory. they can downgrade mkdirs() to a no-op, delete dir, rm dir etc always report a parent dir existing, so their code is happy.

They can get away with this because they know the exact semantics of the code -and that it does not break with this change. We lack that luxury across the broad pool of applications using our library. That doesn't mean we can't allow a select few applications to take advantage of optimisations we have written and tested with them.

steveloughran avatar May 01 '24 13:05 steveloughran

The above proposal of providing list of optimization flags sounds impressive.

Please let me know if this summary looks good:

As part of this Jira:

  • Add fs.s3a.performance.options as new config with only valid values as create and mkdir for now.
  • Create S3APerformanceFlags class (which can contain List of Enum values). Enum can be PerformanceFlag and it should be defined in StoreContext.
  • Mapping of the comma separated String value of fs.s3a.performance.options to S3APerformanceFlags object can be done as static utility of S3APerformanceFlags class. s3afs to take care of creating this object while initializing the file system.
  • Unknown flags are logged once at info in S3APerformanceFlags.
  • Provide PathCapability for fs.s3a.performance.options.${flag} where ${flag} value would be create/mkdir for now. When this is probed, pathCapability should call s3aPerformanceFlagsObject#hasCapability(${flag}).
  • Document the policy for fs.s3a.performance.options to indicate that the semantic of a particular optimization flag must not change but new optimization option could be provided in future to tune this behavior.

For future Jiras:

  • Add more optimization options for delete, rename operations.

One question: IIUC, we don't need to keep the current PR behavior (mkdir improvements) in case fs.s3a.create.performance is enabled, since we are now introducing new fs.s3a.performance.options, correct? Also, will it be prudent to deprecate fs.s3a.create.performance? Probably we can do it in separate jira too.

virajjasani avatar May 02 '24 07:05 virajjasani

As of today, fs.s3a.create.performance is mandatory option while creating file:

      builder
          .create()
          .overwrite(true)
          .must(FS_S3A_CREATE_PERFORMANCE, performanceCreation);

Shall we remove fs.s3a.create.performance as mandatory one and add fs.s3a.performance.options instead? Can we add value of this config as serialized string of S3APerformanceFlags object, and let CreateFileBuilder deserialize it to S3APerformanceFlags object? This way, CreateFileBuilder gets the object back, though this seems too complicated.

Instead, maybe we remove fs.s3a.create.performance as mandatory and provide S3APerformanceFlags object (non-nullable) as part of CreateFileBuilder constructor directly?

virajjasani avatar May 02 '24 07:05 virajjasani

I'm doing a quick PR of the design; @HarshitGupta11 and I discussed it.

steveloughran avatar May 02 '24 17:05 steveloughran

I'm doing a quick PR of the design; @HarshitGupta11 and I discussed it.

Got it, i was planning to embed the logic as part of this PR sometime early next week but separate PR sounds more manageable!

virajjasani avatar May 02 '24 21:05 virajjasani

Got it, i was planning to embed the logic as part of this PR sometime early next week but separate PR sounds more manageable!

it's a bit blurred as there are now options in the code for features we haven't implemented. to be strictest, we would maybe want that base impl to only do create; your pr to add "mkdir", etc.

steveloughran avatar May 03 '24 14:05 steveloughran

with #6789 in, there's space in fs.s3a.performance.flags for this. I think I even created the enum option for you...

steveloughran avatar Jul 29 '24 15:07 steveloughran

Sounds good, let me get to it

virajjasani avatar Jul 29 '24 17:07 virajjasani

surprised to see no merge conflicts here

virajjasani avatar Jul 29 '24 17:07 virajjasani

i need to make doc changes and re-run the test suite.

virajjasani avatar Jul 29 '24 22:07 virajjasani