TileDB-Py
TileDB-Py copied to clipboard
FAILED `test_incomplete_dense_varlen`: empty chars at beginning of buffer giving incorrect result
See run for more details: https://github.com/TileDB-Inc/TileDB-Py/actions/runs/3171748807
______________ IncompleteTest.test_incomplete_dense_varlen[False] ______________
self = <tiledb.tests.test_libtiledb.IncompleteTest object at 0x7f56bcd42b80>
non_overlapping_ranges = False
@pytest.mark.parametrize("non_overlapping_ranges", [True, False])
def test_incomplete_dense_varlen(self, non_overlapping_ranges):
ncells = 10
path = self.path("incomplete_dense_varlen")
str_data = [rand_utf8(random.randint(0, n)) for n in range(ncells)]
data = np.array(str_data, dtype=np.unicode_)
# basic write
dom = tiledb.Domain(tiledb.Dim(domain=(1, len(data)), tile=len(data)))
att = tiledb.Attr(dtype=np.unicode_, var=True)
schema = tiledb.ArraySchema(dom, (att,))
tiledb.DenseArray.create(path, schema)
with tiledb.DenseArray(path, mode="w") as T:
T[:] = data
with tiledb.DenseArray(path, mode="r") as T:
> assert_array_equal(data, T[:])
E AssertionError:
E Arrays are not equal
E
E Mismatched elements: 1 / 10 (10%)
E x: array(['', '⚀', '', '떭', '', '', '', '', '', 'ꋽ'], dtype='<U1')
E y: array(['', '⚀', '', '떭', '', '', '', '', '', 'ꋽ\x00'], dtype=object)
tiledb/tests/test_libtiledb.py:4226: AssertionError
=========================== short test summary info ============================
FAILED tiledb/tests/test_libtiledb.py::IncompleteTest::test_incomplete_dense_varlen[False]
The difference is the null terminator at the end (plus the mismatched dtypes ofc).
There seems to be an issue when the buffer begins with an empty string. Will continue to investigate.
Will also further note that the number of empty chars at the beginning of the data results in an equal number of trailing null chars.
with tiledb.DenseArray(path, mode="r") as T:
> assert_array_equal(data, T[:])
E AssertionError:
E Arrays are not equal
E
E Mismatched elements: 1 / 10 (10%)
E x: array(['', '', '', '떭', '', '', '', '', '', 'ꋽ'], dtype='<U1')
E y: array(['', '', '', '떭', '', '', '', '', '', 'ꋽ\x00\x00\x00\x00'],
E dtype=object)
Reproducible with both dense and sparse arrays.
Issue is unrelated to incomplete queries.