TileDB
TileDB copied to clipboard
Array is unreadable with string dimension containing only empty strings
With only empty strings auto size = s1.size() + s2.size(); is zero, which results in an exception when fetching the array_fragments or fetching the nonempty domain, unfortunately making the array unreadable.
Traceback (most recent call last):
File "empty_strings.py", line 27, in <module>
tiledb.array_fragments(array_name)
File ".../lib/python3.6/site-packages/tiledb/highlevel.py", line 123, in array_fragments
return tiledb.FragmentInfoList(uri, ctx)
File ".../lib/python3.6/site-packages/tiledb/fragment.py", line 28, in __init__
self.nonempty_domain = fi.get_non_empty_domain(schema)
tiledb.libtiledb.TileDBError: [TileDB::FragmentInfo] Error: Cannot get non-empty domain var size; Dimension is fixed sized
https://github.com/TileDB-Inc/TileDB/blob/5658e01d24a70cca9bd60872980ec2703693e945/tiledb/sm/misc/types.h#L116-L126
https://github.com/TileDB-Inc/TileDB/blob/5658e01d24a70cca9bd60872980ec2703693e945/tiledb/sm/misc/types.h#L227-L230
Example
import shutil
import numpy as np
from tiledb import *
array_name = 'empty_strings'
s = ArraySchema(
domain=Domain(
Dim('a', dtype=np.int32, domain=(-10, 10)),
Dim('b', dtype=np.bytes_, domain=(None, None)),
),
attrs=[Attr('x', dtype=np.int32)],
sparse=True,
)
shutil.rmtree(array_name, ignore_errors=True)
SparseArray.create(array_name, schema=s)
with SparseArray(array_name, mode='w') as A:
A[[1, 2, 3], ['', '', '']] = [1, 2, 3]
tiledb.array_fragments(array_name)
with SparseArray(array_name) as A:
# Also fails, but with less obvious error message
A.nonempty_domain()
Thanks Nick, this is a known issue with empty string coordinate strings that we are fixing.