TileDB Selective / distributed consolidation

Our current algorithm operates on all array fragments to choose which subset to consolidate in the next step. A useful functionality would be to (i) tell the algorithm which subset of fragments to focus on and (ii) return a partition of fragment subsets, which can be consolidated safely and independently in parallel by multiple machines.

Jan 02 '19 20:01 stavrospapadopoulos

Hi, is there any progress on this feature.

I believe that it would be very useful to be able to specify fragments for consolidation in order to preserve specific views of a dataset, whilst still consolidating the majority of data for performance.

A case that I am working on currently is where there is potential for small segments of overlapping data. Here, we could continue to consolidate the majority of fragments from non-overlapping data writes, but keep specific fragments written with overlapping data.

Then using the time travel feature, we could look at any of the fragments containing overlapping data.

Thanks for the awesome work!

Jan 29 '20 23:01 outdoorpet

@outdoorpet thanks for revisiting this. Yes, we are planning on starting work on this in mid February. We'd certainly appreciate some feedback when we start submitting PRs, in case we are missing something in terms of usability or functionality.

Jan 30 '20 19:01 stavrospapadopoulos

We now have the ability to consolidate a fragment list and the consolidation plan feature. We also are planning on improving consolidation as a whole in 2024.

Dec 13 '23 16:12 KiterLuc