hadoop icon indicating copy to clipboard operation
hadoop copied to clipboard

HADOOP-18679. Add API for bulk/paged object deletion

Open steveloughran opened this issue 1 year ago • 5 comments

Initial pass at writing an API for bulk deletes, targeting S3 and any store with paged delete support.

Minimal design of a RemoteIterator to provide the list of paths to delete; a progress report will be provided after pages are deleted so as to provide an update of files deleted, and a way for the application code to abort an ongoing delete -such as after a failure.

Aspects of implementation to make clear in markdown spec

  • no guarantee of page size being > 1; or constant through entire operation.
  • after progress callback requests abort, more operations may continue (should we include an "aborting" flag?)
  • no guarantee order of execution. implementations may shuffle paths before posting.
  • no expectation that parent directories will exist after the operation completes; if an object store needs to explicitly look for and create directory markers, that step will be omitted.
  • background option is a hint to prioritise over other write operations vs add an interval between pages/different page size
  • callback: guarantee of thread callback comes from or whether it blocks further work
  • any exception raised by the iterator is an unrecoverable failure. unsubmitted paths may/may not be submitted before reporting the failure
  • no timeouts on iterator next()/hasNext().

How was this patch tested?

No tests yet; working on API first.

For code changes:

  • [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • [ ] If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

steveloughran avatar Aug 26 '23 17:08 steveloughran

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
-1 :x: test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 :green_heart: mvninstall 31m 32s trunk passed
+1 :green_heart: compile 10m 44s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 :green_heart: compile 9m 34s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 :green_heart: checkstyle 0m 51s trunk passed
+1 :green_heart: mvnsite 1m 11s trunk passed
+1 :green_heart: javadoc 0m 58s trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 :green_heart: javadoc 0m 42s trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 :green_heart: spotbugs 1m 41s trunk passed
+1 :green_heart: shadedclient 22m 25s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 :green_heart: mvninstall 0m 39s the patch passed
+1 :green_heart: compile 9m 49s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 :green_heart: javac 9m 49s the patch passed
+1 :green_heart: compile 9m 33s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 :green_heart: javac 9m 33s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
+1 :green_heart: checkstyle 0m 49s the patch passed
+1 :green_heart: mvnsite 1m 8s the patch passed
+1 :green_heart: javadoc 0m 54s the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04
+1 :green_heart: javadoc 0m 42s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 :green_heart: spotbugs 1m 45s the patch passed
+1 :green_heart: shadedclient 22m 19s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 16m 49s hadoop-common in the patch passed.
+1 :green_heart: asflicense 0m 50s The patch does not generate ASF License warnings.
148m 24s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/1/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/5993
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux f297046e6241 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f25a930b8893f2c9f358fd839e4fb943d250c726
Default Java Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/1/testReport/
Max. process+thread count 1253 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Aug 26 '23 19:08 hadoop-yetus

writing up spec made me decide we should have a .opt to indicate when a bulk delete is a "background" operation, which may be executed at a rate to interfere less with live queries, e.g: smaller pages, rate limited buildup of pages, different throttle retry policy.

steveloughran avatar Aug 29 '23 10:08 steveloughran

@ahmarsuhail

  • caller provides a remote iterator, such as the ones we do for listing or another source/transformation (see RemoteIterators)
  • build() call returns some result
  • implementation kicks off a worker thread to process the iterator, reading its values in until there's enough to kick off a DELETE request (page or maybe a parallel set in a thread pool)
  • after each page/set of deletes, invokes the supplied callback of results
  • then continues, unless told to stop
  • finish only on: iterator has nothing, iterator raises an exception
  • or maybe on reaching some limit on failures
  • including maybe those considered unrecoverable

steveloughran avatar Oct 05 '23 09:10 steveloughran

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 21s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 0s codespell was not available.
+0 :ok: detsecrets 0m 0s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
-1 :x: test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+0 :ok: mvndep 13m 47s Maven dependency ordering for branch
+1 :green_heart: mvninstall 19m 28s trunk passed
+1 :green_heart: compile 8m 19s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: compile 7m 27s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: checkstyle 2m 4s trunk passed
+1 :green_heart: mvnsite 1m 24s trunk passed
+1 :green_heart: javadoc 1m 4s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 1m 0s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 2m 9s trunk passed
+1 :green_heart: shadedclient 19m 50s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 20s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 50s the patch passed
+1 :green_heart: compile 7m 52s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javac 7m 52s the patch passed
+1 :green_heart: compile 7m 27s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: javac 7m 27s the patch passed
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
-0 :warning: checkstyle 1m 59s /results-checkstyle-root.txt root: The patch generated 5 new + 3 unchanged - 0 fixed = 8 total (was 3)
+1 :green_heart: mvnsite 1m 18s the patch passed
+1 :green_heart: javadoc 1m 2s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 0m 57s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 2m 20s the patch passed
+1 :green_heart: shadedclient 19m 45s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 16m 22s hadoop-common in the patch passed.
+1 :green_heart: unit 2m 10s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 36s The patch does not generate ASF License warnings.
143m 28s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/3/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/5993
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux e75e83010c54 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / d69fac0192c14889f0b3aa62bdb76e1d196eec8c
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/3/testReport/
Max. process+thread count 2432 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Jan 01 '24 20:01 hadoop-yetus

:broken_heart: -1 overall

Vote Subsystem Runtime Logfile Comment
+0 :ok: reexec 0m 20s Docker mode activated.
_ Prechecks _
+1 :green_heart: dupname 0m 0s No case conflicting files found.
+0 :ok: codespell 0m 1s codespell was not available.
+0 :ok: detsecrets 0m 1s detect-secrets was not available.
+1 :green_heart: @author 0m 0s The patch does not contain any @author tags.
-1 :x: test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+0 :ok: mvndep 14m 17s Maven dependency ordering for branch
-1 :x: mvninstall 4m 17s /branch-mvninstall-root.txt root in trunk failed.
-1 :x: compile 3m 53s /branch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt root in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 :x: compile 3m 36s /branch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt root in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
+1 :green_heart: checkstyle 1m 54s trunk passed
+1 :green_heart: mvnsite 1m 12s trunk passed
+1 :green_heart: javadoc 0m 50s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 0m 37s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 1m 49s trunk passed
-1 :x: shadedclient 4m 52s branch has errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 :ok: mvndep 0m 20s Maven dependency ordering for patch
+1 :green_heart: mvninstall 0m 45s the patch passed
-1 :x: compile 3m 48s /patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 :x: javac 3m 48s /patch-compile-root-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.
-1 :x: compile 3m 36s /patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
-1 :x: javac 3m 36s /patch-compile-root-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt root in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08.
+1 :green_heart: blanks 0m 0s The patch has no blanks issues.
-0 :warning: checkstyle 1m 48s /results-checkstyle-root.txt root: The patch generated 5 new + 3 unchanged - 0 fixed = 8 total (was 3)
+1 :green_heart: mvnsite 1m 6s the patch passed
+1 :green_heart: javadoc 0m 39s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 :green_heart: javadoc 0m 41s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 :green_heart: spotbugs 2m 1s the patch passed
-1 :x: shadedclient 4m 57s patch has errors when building and testing our client artifacts.
_ Other Tests _
+1 :green_heart: unit 16m 13s hadoop-common in the patch passed.
+1 :green_heart: unit 2m 24s hadoop-aws in the patch passed.
+1 :green_heart: asflicense 0m 24s The patch does not generate ASF License warnings.
79m 47s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/artifact/out/Dockerfile
GITHUB PR https://github.com/apache/hadoop/pull/5993
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux de98fcc15d5f 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / c7b4e99c212f8a3e60da456560c629be94326f76
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/testReport/
Max. process+thread count 3149 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5993/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus avatar Jan 24 '24 18:01 hadoop-yetus