TileDB-Py icon indicating copy to clipboard operation
TileDB-Py copied to clipboard

No compression of timestamps (t.tdb) after consolidation

Open peterculviner opened this issue 5 months ago • 1 comments

I noticed there's no (default) compression on timestamps after consolidation, at least via the python api for sparse arrays. This wouldn't be an issue, but I also can't find a way to set a compression filter on these values (unlike user-defined attributes, dimensions, coordinates, and offsets). Am I missing a global default compression argument? I can't find any reference to one.

This becomes a source of file size with large sparse arrays.

For example:

import tiledb
import numpy as np
from itertools import product

array_path = 'test_array'

dim1 = tiledb.Dim(
    name="d1",
    domain=(0, 100),
    dtype=np.uint64,)
dim2 = tiledb.Dim(
    name="d2",
    domain=(0, 100),
    dtype=np.uint64,)

domain = tiledb.Domain(
    dim1, dim2)

attributes = [  # define attributes
    tiledb.Attr(
        name='attr1', dtype=np.dtype('uint64'), fill=0)]
schema = tiledb.ArraySchema(  # generate a schema
    domain=domain, attrs=attributes, sparse=True, allows_duplicates=True,
    coords_filters= [tiledb.filter.ZstdFilter(9)])
tiledb.Array.create(array_path, schema)
d1, d2 = np.asarray(list(product(range(0,100), range(0,100)))).T

array = tiledb.open(array_path, 'w')
# write 1
array[d1, d2] = {'attr1': np.full(10000, 1)}
# write 2
array[d1, d2] = {'attr1': np.full(10000, 2)}
array.close()

tiledb.consolidate(array_path)

Compare file sizes of a0.tdb and t.tdb in the consolidated fragment - they match suggesting they are both uncompressed uint64.

peterculviner avatar Aug 07 '25 19:08 peterculviner

Alternatively, if it's possible to avoid creation of a t.tdb file where not necessary (e.g. if I set all timestamps to 1 while writing so all match and would be redundant information), that'd also be a great workaround. I don't know if that's more feasible than applying compression to timestamp data.

peterculviner avatar Aug 08 '25 14:08 peterculviner