TileDB icon indicating copy to clipboard operation
TileDB copied to clipboard

Implement Query Partitioning

Open Shelnutt2 opened this issue 7 years ago • 1 comments

For high level applications such as presto or spark having a query partitioner that can break a query into optimally sized sub arrays will be beneficial. Ideally the partition function can take one or more sub arrays as input and the number of desired partition and return a list of new subarrays to query based off of.

Implementing heuristics so for sparse arrays the partitions can be balanced is important. Currently Presto and spark implement their own naive partitioning which can result in unbalanced reads on a sparse array.

Shelnutt2 avatar Dec 13 '18 21:12 Shelnutt2

An experimental partitioner was added in #1197 . See also #1225 -- decision needs to be taken on an API for this or not.

tdenniston avatar Apr 25 '19 13:04 tdenniston