delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Leverage batch delete in vacuum for GCS

Open Dandandan opened this issue 4 years ago • 2 comments

Description

GCS supports batching multiple delete operations (up to 100) in a single call. This can speed up vacuum. We shuld add a new delete_objs implementation of StorageBackend for GCS.

Use Case

Related Issue(s) https://github.com/delta-io/delta-rs/issues/394

Dandandan avatar Aug 22 '21 04:08 Dandandan

Thanks @Dandandan for filing this, FYI @blogle in case you run into vacuum performance issue with your PoC.

houqp avatar Aug 22 '21 05:08 houqp

I haven't had the need to do any work around vacuum just yet, and to be honest I have been thinking I would just invoke vacuum on some schedule from spark when needed.... As we further scope out our needs and ensure those features are built out, I wouldn't mind going through and adding this to the storage backend's, assuming nobody does so in the interim.

blogle avatar Aug 23 '21 21:08 blogle

Now that we use objectstore, we should implement this upstream: https://github.com/apache/arrow-rs/issues/2615

wjones127 avatar Sep 21 '22 01:09 wjones127

Closing this since we have this support now, and we're improving it further with the adoption of delete_stream

rtyler avatar Oct 25 '23 16:10 rtyler