TileDB-Py icon indicating copy to clipboard operation
TileDB-Py copied to clipboard

Sub-array views?

Open Hoeze opened this issue 4 years ago • 3 comments

Hi, is there some way to get a subarray view of a TileDB store?

My use case would be the following:

import tiledb as tdb
A = tdb.open("path")
print(A.domain)
# Domain(
#     Dim(name='chrom', domain=(None,None), tile=1, dtype=np.bytes_'),
#     Dim(name='start', domain=(0, 18446744073709551614), tile=100000, dtype='uint64'),
#     Dim(name='gene_start', domain=(0, 18446744073709551614), tile=10000000, dtype='uint64'),
# )
sub = A.method_that _returns_subarray(chrom="chr13", start=slice(0, 1000000), gene_start=slice(0, 1000000))

# now get unique dimension labels in the subarray:
chrom_idx = start_idx = sub.unique_dim_values("chrom")
start_idx = sub.unique_dim_values("start")
gene_idx = sub.unique_dim_values("gene_start")

Hoeze avatar Apr 17 '21 13:04 Hoeze

Hi @Hoeze, if I understand correctly, you want an object that can be indexed (eg multi_index) within only the specified range(s), or call other tiledb.Array methods like unique_dim_values? Would you expect that the whole array is read into memory when this object is created, or only when indexed? The situation is different here than NumPy views, because NumPy arrays are already in memory.

I'm trying to understand the goal/use-case here, in order to prioritize.

ihnorton avatar Apr 20 '21 01:04 ihnorton

I would like to avoid loading the array into memory. My main goal is to get existing coordinates inside a range and load the data in this range on-demand with dask.

Hoeze avatar Apr 23 '21 08:04 Hoeze

get existing coordinates inside a range and load the data in this range on-demand with dask.

If I am understanding correctly, you can do the first part (get only coords) with A.query(attrs=[]).multi_index[<your ranges>] which will only return the coordinates of data matching the index ranges (excludes all attribute data). Then you'll have to partition those coordinates, and do full reads for each partition on each node.

ihnorton avatar Jun 11 '21 02:06 ihnorton