delta-rs
delta-rs copied to clipboard
Implement vacuum command
delta-rs should have a "vacuum table" utility analogous to the one provided by the open source Spark Delta Lake implementation. This utility is useful for cleaning up old files that are no longer referenced by the delta log (e.g. files rewritten by merge statements, optimize command etc.).
See the VacuumCommand in the open source implementation for reference.
@fvaleye I think this is actually done right? I'm not clear what work we have left to do
@fvaleye I think this is actually done right? I'm not clear what work we have left to do
Yes, it is already implemented! Hum, we need to improve the tests suite: https://github.com/delta-io/delta-rs/issues/227
Как насчет поправить документацию?
Наверное стоит подновить с вот этого
на вот это?
@rtyler @fvaleye It looks like there are still two serious issues with vacuum implementation:
- vacuum lists all files in dataset using
StorageBackend.list_objs
. The problem is this function returns all files (including these in subdirectories) ons3
backend andgcs
backend (althrough I'm not sure about gcs). Onfile
andazure
backends this function lists only first-level files (without recursing to subdirectories). - vacuum ignores files not referenced by delta log at all (so not included on DeltaTableState.files() and DeltaTableState.all_tombstones() lists).
Resolved by #669.