TileDB icon indicating copy to clipboard operation
TileDB copied to clipboard

Array is unreadable with string dimension containing only empty strings

Open gatesn opened this issue 4 years ago • 1 comments

With only empty strings auto size = s1.size() + s2.size(); is zero, which results in an exception when fetching the array_fragments or fetching the nonempty domain, unfortunately making the array unreadable.

Traceback (most recent call last):
  File "empty_strings.py", line 27, in <module>
    tiledb.array_fragments(array_name)
  File ".../lib/python3.6/site-packages/tiledb/highlevel.py", line 123, in array_fragments
    return tiledb.FragmentInfoList(uri, ctx)
  File ".../lib/python3.6/site-packages/tiledb/fragment.py", line 28, in __init__
    self.nonempty_domain = fi.get_non_empty_domain(schema)
tiledb.libtiledb.TileDBError: [TileDB::FragmentInfo] Error: Cannot get non-empty domain var size; Dimension is fixed sized

https://github.com/TileDB-Inc/TileDB/blob/5658e01d24a70cca9bd60872980ec2703693e945/tiledb/sm/misc/types.h#L116-L126

https://github.com/TileDB-Inc/TileDB/blob/5658e01d24a70cca9bd60872980ec2703693e945/tiledb/sm/misc/types.h#L227-L230

Example
import shutil

import numpy as np
from tiledb import *

array_name = 'empty_strings'

s = ArraySchema(
    domain=Domain(
        Dim('a', dtype=np.int32, domain=(-10, 10)),
        Dim('b', dtype=np.bytes_, domain=(None, None)),
    ),
    attrs=[Attr('x', dtype=np.int32)],
    sparse=True,
)

shutil.rmtree(array_name, ignore_errors=True)
SparseArray.create(array_name, schema=s)

with SparseArray(array_name, mode='w') as A:
    A[[1, 2, 3], ['', '', '']] = [1, 2, 3]

tiledb.array_fragments(array_name)

with SparseArray(array_name) as A:
    # Also fails, but with less obvious error message
    A.nonempty_domain()

gatesn avatar May 21 '21 15:05 gatesn

Thanks Nick, this is a known issue with empty string coordinate strings that we are fixing.

stavrospapadopoulos avatar May 21 '21 20:05 stavrospapadopoulos