TileDB-Py icon indicating copy to clipboard operation
TileDB-Py copied to clipboard

FAILED `test_incomplete_dense_varlen`: empty chars at beginning of buffer giving incorrect result

Open github-actions[bot] opened this issue 3 years ago • 3 comments

See run for more details: https://github.com/TileDB-Inc/TileDB-Py/actions/runs/3171748807

github-actions[bot] avatar Oct 03 '22 06:10 github-actions[bot]

______________ IncompleteTest.test_incomplete_dense_varlen[False] ______________

self = <tiledb.tests.test_libtiledb.IncompleteTest object at 0x7f56bcd42b80>
non_overlapping_ranges = False

    @pytest.mark.parametrize("non_overlapping_ranges", [True, False])
    def test_incomplete_dense_varlen(self, non_overlapping_ranges):
        ncells = 10
        path = self.path("incomplete_dense_varlen")
        str_data = [rand_utf8(random.randint(0, n)) for n in range(ncells)]
        data = np.array(str_data, dtype=np.unicode_)
    
        # basic write
        dom = tiledb.Domain(tiledb.Dim(domain=(1, len(data)), tile=len(data)))
        att = tiledb.Attr(dtype=np.unicode_, var=True)
    
        schema = tiledb.ArraySchema(dom, (att,))
    
        tiledb.DenseArray.create(path, schema)
        with tiledb.DenseArray(path, mode="w") as T:
            T[:] = data
    
        with tiledb.DenseArray(path, mode="r") as T:
>           assert_array_equal(data, T[:])
E           AssertionError: 
E           Arrays are not equal
E           
E           Mismatched elements: 1 / 10 (10%)
E            x: array(['', '⚀', '', '떭', '', '', '', '', '', 'ꋽ'], dtype='<U1')
E            y: array(['', '⚀', '', '떭', '', '', '', '', '', 'ꋽ\x00'], dtype=object)

tiledb/tests/test_libtiledb.py:4226: AssertionError
=========================== short test summary info ============================
FAILED tiledb/tests/test_libtiledb.py::IncompleteTest::test_incomplete_dense_varlen[False]

nguyenv avatar Oct 03 '22 12:10 nguyenv

The difference is the null terminator at the end (plus the mismatched dtypes ofc).

nguyenv avatar Oct 03 '22 12:10 nguyenv

There seems to be an issue when the buffer begins with an empty string. Will continue to investigate.

Will also further note that the number of empty chars at the beginning of the data results in an equal number of trailing null chars.

        with tiledb.DenseArray(path, mode="r") as T:
>           assert_array_equal(data, T[:])
E           AssertionError:
E           Arrays are not equal
E
E           Mismatched elements: 1 / 10 (10%)
E            x: array(['', '', '', '떭', '', '', '', '', '', 'ꋽ'], dtype='<U1')
E            y: array(['', '', '', '떭', '', '', '', '', '', 'ꋽ\x00\x00\x00\x00'],
E                 dtype=object)

Reproducible with both dense and sparse arrays.

Issue is unrelated to incomplete queries.

nguyenv avatar Oct 03 '22 15:10 nguyenv