neon icon indicating copy to clipboard operation
neon copied to clipboard

Start using remote_storage in S3 scrubber for PurgeGarbage

Open arpad-m opened this issue 1 year ago • 4 comments

Starts using the remote_storage crate in the S3 scrubber for the PurgeGarbage subcommand.

The remote_storage crate is generic over various backends and thus using it gives us the ability to run the scrubber against all supported backends.

Start with the PurgeGarbage subcommand as it doesn't use stream_tenants.

Part of #7547.

arpad-m avatar Jun 01 '24 00:06 arpad-m

Marking this as a draft as it is untested.

arpad-m avatar Jun 01 '24 00:06 arpad-m

3169 tests run: 3047 passed, 1 failed, 121 skipped (full report)


Failures on Postgres 15

  • test_storage_controller_heartbeats[failure2]: release
# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_storage_controller_heartbeats[release-pg15-failure2]"

Code coverage* (full report)

  • functions: 32.6% (6999 of 21466 functions)
  • lines: 50.0% (55267 of 110537 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
a1807a099095aa7dc5e3825b6d4f6c5565b3617b at 2024-07-20T13:45:21.249Z :recycle:

github-actions[bot] avatar Jun 01 '24 00:06 github-actions[bot]

Start with the PurgeGarbage subcommand as it doesn't use stream_tenants.

If you could implement a streaming listing in GenericRemoteStorage to enable this, it would also be useful in places like tenant deletion. I'm not sure how hard it is to do that in a trait, as the version I have in the scrubber (stream_listing) uses -> impl Stream<Item = anyhow::Result<ObjectIdentifier>> + 'a , so the actual stream type will be different for different trait impls.

jcsp avatar Jun 05 '24 16:06 jcsp

the actual stream type will be different for different trait impls

impl trait in trait has become stable together with async fn in trait, so it's available to users. For GenericRemoteStorage, it will likely have to return an enum though because it can only return one type.

I agree that adding a streaming listing interface is the best way to implement stream_tenants. It will have to have retries inside.

I would like to do it in a follow-up to this PR.

arpad-m avatar Jun 06 '24 13:06 arpad-m