datatable icon indicating copy to clipboard operation
datatable copied to clipboard

Support Pandas ExtensionTypes / PyArrow StructTypes

Open Hoeze opened this issue 4 years ago • 2 comments

Hi, would it be possible to support Pandas ExtensionTypes like IntervalArrays?

Right now, it is not possible to read any PyArrow table containing StructTypes using datatable. Also, I would like to be able implementing my own extension types for advanced features like range joins.

Pandas ExtensionTypes allow implementing all of this for DataFrames. A similar concept for datatable would ne nice :)

Hoeze avatar May 02 '21 22:05 Hoeze

Pandas IntervalArray is just a class containing 2 columns: .left and .right. So it should be easy to convert into a datatable Frame with 2 columns. Admittedly, a better solution would be to implement the "struct" type in datatable, in which case IntervalArray could be converted into a struct column.

For general ExtensionTypes in pandas, they are supposed to implement the __arrow_array__() method, and thus we can first transform a column into an Arrow array, and then back into datatable.

st-pasha avatar May 03 '21 17:05 st-pasha

Struct-type support could also enable direct interop with PySpark :)

Hoeze avatar May 03 '21 17:05 Hoeze