delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Unit and integration tests for the vacuum command

Open rtyler opened this issue 4 years ago • 1 comments

Our preliminary support for vacuum would benefit from some more unit and integration tests to ensure that it's only deleting the right files whenever the vacuum is invoked.

rtyler avatar Apr 29 '21 22:04 rtyler

Referencing @mrk-its comment from #97 so it's not lost.

@rtyler @fvaleye It looks like there are still two serious issues with vacuum implementation:

* vacuum lists all files in dataset using `StorageBackend.list_objs`. The problem is this function returns all files (including these in subdirectories) on `s3` backend and `gcs` backend (althrough I'm not sure about gcs). On `file` and `azure` backends this function  lists only first-level files (without recursing to subdirectories).

* vacuum ignores files not referenced by delta log at all (so not included on DeltaTableState.files() and DeltaTableState.all_tombstones() lists).

danmx avatar Sep 05 '22 06:09 danmx