TileDB-Py icon indicating copy to clipboard operation
TileDB-Py copied to clipboard

DATETIME_HR datatype not understood

Open gatesn opened this issue 4 years ago • 1 comments

Hit when passing use_arrow=True to query a sparse array with a datetime64[h] dimension.

tiledb/multirange_indexing.py in _run_query(self, query, preload_metadata)
    215                 )
    216             with timing("py.buffer_conversion_time"):
--> 217                 table = self.pyquery._buffers_to_pa_table()
    218                 return table if query.return_arrow else table.to_pandas()
    219 

TileDBError: TileDB-Arrow: tiledb datatype not understood ('DATETIME_HR', cell_val_num: 1)

Looks like the constant is different to what's being checked?

https://github.com/TileDB-Inc/TileDB-Py/blob/a793c9b2d360fcc31a5fcde6b8a72bc04640695d/tiledb/core.cc#L183-L186

TileDB 2.4.0 TileDB-Py 0.10.1 Pyarrow 5.0.0

gatesn avatar Oct 01 '21 15:10 gatesn

Hi @gatesn, Arrow doesn't support hour resolution:

import pyarrow as pa, numpy as np
a = np.array([1,2,3], dtype="m8[h]")
pa.array(a)
...
ArrowNotImplementedError: Unsupported timedelta64 time unit

I believe currently the arrow<>pandas integration stores hours as nanoseconds and then does the conversion back on read based on metadata. We can support this with a similar approach -- we'll aim to slot this in for development this month.

ihnorton avatar Oct 01 '21 16:10 ihnorton