lance icon indicating copy to clipboard operation
lance copied to clipboard

feat: allow compacting files into new format version

Open chebbyChefNEQ opened this issue 1 year ago • 2 comments

bootleg parallel migration tool using compaction task execution facility

In next PR, I'll add a force_rewrite option to rewrite files even when the file size is equal to the desired number of rows/size

chebbyChefNEQ avatar Aug 18 '24 19:08 chebbyChefNEQ

Codecov Report

Attention: Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 79.15%. Comparing base (7284521) to head (430bd81).

Files Patch % Lines
rust/lance/src/dataset/optimize.rs 25.00% 1 Missing and 2 partials :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2749      +/-   ##
==========================================
+ Coverage   79.13%   79.15%   +0.01%     
==========================================
  Files         227      227              
  Lines       67398    67402       +4     
  Branches    67398    67402       +4     
==========================================
+ Hits        53338    53353      +15     
+ Misses      10956    10948       -8     
+ Partials     3104     3101       -3     
Flag Coverage Δ
unittests 79.15% <25.00%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Aug 18 '24 20:08 codecov-commenter

We want to avoid datasets that are split between two different versions (e.g. some files in v1 and some in v2). I think this approach can cause that situation if only some files need compacted? That could be potentially dangerous (although I think the write would fail in this situation)

I see. Let me make this option implicitly rewrite the whole dataset then.

chebbyChefNEQ avatar Aug 19 '24 17:08 chebbyChefNEQ

Thank you for your contribution. This PR has been inactive for a while, so we're closing it to free up bandwidth. Feel free to reopen it if you still find it useful.

github-actions[bot] avatar Nov 16 '25 02:11 github-actions[bot]