TileDB-Py icon indicating copy to clipboard operation
TileDB-Py copied to clipboard

`tiledb.main.array_to_buffer` Correct Data Buffer With Empty Strings

Open nguyenv opened this issue 3 years ago • 1 comments
trafficstars

Previously, the following NumPy array would yield the data and offset buffers below.

>>> tiledb.main.array_to_buffer(np.array(["", "", "", "a", "", "", "", "", "", "b"], dtype=np.unicode_), True, False)
(array([97, 98,  0,  0,  0,  0,  0,  0,  0,  0], dtype=uint8), array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1], dtype=uint64))

The data buffer needs to be resized such that it does not containing the trailing NULLs. Otherwise, when we read the data back, we see the last element padded with trailing NULLs.

This PR now correctly yields the following data buffer.

>>> tiledb.main.array_to_buffer(np.array(["", "", "", "a", "", "", "", "", "", "b"], dtype=np.unicode_), True, False)
(array([97, 98], dtype=uint8), array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1], dtype=uint64)

nguyenv avatar Oct 04 '22 18:10 nguyenv