lance icon indicating copy to clipboard operation
lance copied to clipboard

Incremental compaction of lance files

Open westonpace opened this issue 1 year ago • 0 comments

Large lance files are often created by compacting smaller lance files. E.g. one thousand 1GB lance files compacted into a single 1TB file. Creating these large files is very difficult when working with cloud storage. The work cannot be distributed to multiple workers, tasks are slow (and thus prone to preemption) and cloud storage has been giving many errors when finalizing multipart uploads for these large files.

A more sophisticated approach would be to first scan the original file metadata, determine roughly what pages will be created, and then break the writing of the file up into multiple tasks (with a final commit task).

westonpace avatar Sep 03 '24 15:09 westonpace