delta-rs
delta-rs copied to clipboard
Leverage batch delete in vacuum for GCS
Description
GCS supports batching multiple delete operations (up to 100) in a single call. This can speed up vacuum.
We shuld add a new delete_objs implementation of StorageBackend for GCS.
Use Case
Related Issue(s) https://github.com/delta-io/delta-rs/issues/394
Thanks @Dandandan for filing this, FYI @blogle in case you run into vacuum performance issue with your PoC.
I haven't had the need to do any work around vacuum just yet, and to be honest I have been thinking I would just invoke vacuum on some schedule from spark when needed.... As we further scope out our needs and ensure those features are built out, I wouldn't mind going through and adding this to the storage backend's, assuming nobody does so in the interim.
Now that we use objectstore, we should implement this upstream: https://github.com/apache/arrow-rs/issues/2615
Closing this since we have this support now, and we're improving it further with the adoption of delete_stream