TileDB-Py
TileDB-Py copied to clipboard
DATETIME_HR datatype not understood
Hit when passing use_arrow=True to query a sparse array with a datetime64[h] dimension.
tiledb/multirange_indexing.py in _run_query(self, query, preload_metadata)
215 )
216 with timing("py.buffer_conversion_time"):
--> 217 table = self.pyquery._buffers_to_pa_table()
218 return table if query.return_arrow else table.to_pandas()
219
TileDBError: TileDB-Arrow: tiledb datatype not understood ('DATETIME_HR', cell_val_num: 1)
Looks like the constant is different to what's being checked?
https://github.com/TileDB-Inc/TileDB-Py/blob/a793c9b2d360fcc31a5fcde6b8a72bc04640695d/tiledb/core.cc#L183-L186
TileDB 2.4.0 TileDB-Py 0.10.1 Pyarrow 5.0.0
Hi @gatesn, Arrow doesn't support hour resolution:
import pyarrow as pa, numpy as np
a = np.array([1,2,3], dtype="m8[h]")
pa.array(a)
...
ArrowNotImplementedError: Unsupported timedelta64 time unit
I believe currently the arrow<>pandas integration stores hours as nanoseconds and then does the conversion back on read based on metadata. We can support this with a similar approach -- we'll aim to slot this in for development this month.