hadoop icon indicating copy to clipboard operation
hadoop copied to clipboard

HADOOP-18679. Add API for bulk/paged object deletion

Open steveloughran opened this issue 1 year ago • 5 comments

A more minimal design that is easier to use and implement than #5993

Caller creates a BulkOperation; they get the page size of it and then submit batches to delete of less than that size.

The outcome of each call contains a list of failures.

S3A implementation to show how straightforward it is.

Even with the single entry page size, it is still more efficient to use this as it doesn't try to recreate a parent dir or perform any probes to see if it is a directory: it maps straight to a DELETE call.

How was this patch tested?

If the design looks good, I'll write some contract tests as well as a filesystem api specification.

For code changes:

  • [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • [ ] If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

steveloughran avatar Jan 24 '24 17:01 steveloughran

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 49s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
-1 :x: test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+0 :ok: mvndep 14m 21s Maven dependency ordering for branch
-1 :x: mvninstall 7m 7s /branch-mvninstall-root.txt root in trunk failed.
-1 :x: compile 9m 3s /branch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt root in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 :x: compile 8m 32s /branch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt root in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
+1 :green_heart: checkstyle 4m 38s trunk passed
+1 :green_heart: mvnsite 2m 18s trunk passed
+1 :green_heart: javadoc 1m 23s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 3s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 4m 14s trunk passed
-1 :x: shadedclient 11m 29s branch has errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 28s Maven dependency ordering for patch
+1 :green_heart: mvninstall 1m 47s the patch passed
-1 :x: compile 12m 33s /patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 :x: javac 12m 33s /patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 :x: compile 12m 23s /patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
-1 :x: javac 12m 23s /patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
-0 :warning: checkstyle 6m 18s /results-checkstyle-root.txt root: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3)
+1 :green_heart: mvnsite 2m 56s the patch passed
-1 :x: javadoc 1m 18s /results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
+1 :green_heart: javadoc 1m 9s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
-1 :x: spotbugs 1m 34s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 :x: shadedclient 3m 51s patch has errors when building and testing our client artifacts.
_ Other Tests _
-1 :x: unit 17m 58s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
-1 :x: unit 1m 2s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch passed.
-1 :x: asflicense 0m 41s /results-asflicense.txt The patch generated 1 ASF License warnings.
135m 40s
Reason Tests
Failed junit tests hadoop.ipc.TestRPC
hadoop.util.TestDataChecksum
hadoop.fs.s3a.commit.staging.TestDirectoryCommitterScale
hadoop.fs.s3a.TestS3ADeleteOnExit
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6494
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 9de7a95ad1b3 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 1774c5bae62c8f41e3d23ee26cbee95fdae94844
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/testReport/
Max. process+thread count 302 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Jan 24 '24 19:01 hadoop-yetus

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 57s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
-1 :x: test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+0 :ok: mvndep 14m 10s Maven dependency ordering for branch
+1 :green_heart: mvninstall 35m 50s trunk passed
+1 :green_heart: compile 18m 13s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 16m 31s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: checkstyle 4m 37s trunk passed
+1 :green_heart: mvnsite 2m 30s trunk passed
+1 :green_heart: javadoc 1m 47s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 33s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 3m 43s trunk passed
+1 :green_heart: shadedclient 39m 48s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 36s Maven dependency ordering for patch
+1 :green_heart: mvninstall 2m 8s the patch passed
+1 :green_heart: compile 18m 41s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 18m 41s the patch passed
+1 :green_heart: compile 17m 6s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: javac 17m 6s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
-0 :warning: checkstyle 4m 31s /results-checkstyle-root.txt root: The patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3)
+1 :green_heart: mvnsite 2m 28s the patch passed
-1 :x: javadoc 1m 9s /results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
+1 :green_heart: javadoc 1m 33s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 4m 5s the patch passed
+1 :green_heart: shadedclient 38m 38s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 19m 5s hadoop-common in the patch passed.
+1 :green_heart: unit 3m 7s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 57s The patch does not generate ASF License warnings.
260m 38s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/2/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6494
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux f605ff408523 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 5afb6598adc1d81d8dcbbbaaecd2fa28d558e36f
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/2/testReport/
Max. process+thread count 3137 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Jan 26 '24 16:01 hadoop-yetus

+add a FileUtils method to assist deletion here, with FileUtils.bulkDeletePageSize(path) -> int and `FileUtils.bulkDelete(path, List) -> List<Path>; each will create a bulk delete object, execute the operation/probe and then close.

why so?

Makes reflection binding straighforward: no new types; just two methods.

steveloughran avatar Feb 09 '24 12:02 steveloughran

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 51s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
-1 :x: test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+0 :ok: mvndep 14m 8s Maven dependency ordering for branch
+1 :green_heart: mvninstall 36m 26s trunk passed
+1 :green_heart: compile 20m 5s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 16m 37s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: checkstyle 4m 42s trunk passed
+1 :green_heart: mvnsite 2m 31s trunk passed
+1 :green_heart: javadoc 1m 47s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 33s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
-1 :x: spotbugs 2m 33s /branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html hadoop-common-project/hadoop-common in trunk has 1 extant spotbugs warnings.
+1 :green_heart: shadedclient 38m 23s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 31s Maven dependency ordering for patch
+1 :green_heart: mvninstall 1m 26s the patch passed
+1 :green_heart: compile 17m 29s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 17m 29s the patch passed
+1 :green_heart: compile 16m 29s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: javac 16m 29s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
-0 :warning: checkstyle 4m 32s /results-checkstyle-root.txt root: The patch generated 1 new + 39 unchanged - 0 fixed = 40 total (was 39)
+1 :green_heart: mvnsite 2m 30s the patch passed
-1 :x: javadoc 1m 7s /results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt hadoop-common-project_hadoop-common-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)
+1 :green_heart: javadoc 1m 30s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 4m 4s the patch passed
+1 :green_heart: shadedclient 38m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 19m 7s hadoop-common in the patch passed.
+1 :green_heart: unit 3m 9s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 57s The patch does not generate ASF License warnings.
259m 23s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/3/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/6494
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux d1aa5776a4a0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0823d3fdaccf95dbf6777a8953e0a70d25f9c14e
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/3/testReport/
Max. process+thread count 3137 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6494/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Feb 09 '24 18:02 hadoop-yetus

FYI i want to pull the rate limiter API of #6596 in here too; we'd have a rate limiter in s3a store which if enabled would limit #of deletes which can be issued on a bucket. Ideally it'd be at 3000 on s3 standard, off for s3 express and third party stores, so reduce load this call can generate.

steveloughran avatar Mar 28 '24 16:03 steveloughran

In #6686 I'm creating a new utils class for reflection access, nothing else. And proposing that all tests of the API use reflection to be really confident it works and that there's no accidental changes which break reflection

steveloughran avatar Mar 28 '24 17:03 steveloughran