TileDB-Py
TileDB-Py copied to clipboard
`tiledb.main.array_to_buffer` Correct Data Buffer With Empty Strings
trafficstars
Previously, the following NumPy array would yield the data and offset buffers below.
>>> tiledb.main.array_to_buffer(np.array(["", "", "", "a", "", "", "", "", "", "b"], dtype=np.unicode_), True, False)
(array([97, 98, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint8), array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1], dtype=uint64))
The data buffer needs to be resized such that it does not containing the trailing NULLs. Otherwise, when we read the data back, we see the last element padded with trailing NULLs.
This PR now correctly yields the following data buffer.
>>> tiledb.main.array_to_buffer(np.array(["", "", "", "a", "", "", "", "", "", "b"], dtype=np.unicode_), True, False)
(array([97, 98], dtype=uint8), array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1], dtype=uint64)
This pull request has been linked to Shortcut Story #22123: tiledb.main.array_to_buffer With Empty Strings Yielding Different Data and Offset Buffers In Old vs. New Version.