delta-rs
delta-rs copied to clipboard
Unit and integration tests for the vacuum command
Our preliminary support for vacuum would benefit from some more unit and integration tests to ensure that it's only deleting the right files whenever the vacuum is invoked.
Referencing @mrk-its comment from #97 so it's not lost.
@rtyler @fvaleye It looks like there are still two serious issues with vacuum implementation:
* vacuum lists all files in dataset using `StorageBackend.list_objs`. The problem is this function returns all files (including these in subdirectories) on `s3` backend and `gcs` backend (althrough I'm not sure about gcs). On `file` and `azure` backends this function lists only first-level files (without recursing to subdirectories). * vacuum ignores files not referenced by delta log at all (so not included on DeltaTableState.files() and DeltaTableState.all_tombstones() lists).