TileDB
TileDB copied to clipboard
Implement Query Partitioning
For high level applications such as presto or spark having a query partitioner that can break a query into optimally sized sub arrays will be beneficial. Ideally the partition function can take one or more sub arrays as input and the number of desired partition and return a list of new subarrays to query based off of.
Implementing heuristics so for sparse arrays the partitions can be balanced is important. Currently Presto and spark implement their own naive partitioning which can result in unbalanced reads on a sparse array.
An experimental partitioner was added in #1197 . See also #1225 -- decision needs to be taken on an API for this or not.