datatable
datatable copied to clipboard
Support Pandas ExtensionTypes / PyArrow StructTypes
Hi, would it be possible to support Pandas ExtensionTypes like IntervalArrays?
Right now, it is not possible to read any PyArrow table containing StructTypes using datatable. Also, I would like to be able implementing my own extension types for advanced features like range joins.
Pandas ExtensionTypes allow implementing all of this for DataFrames. A similar concept for datatable would ne nice :)
Pandas IntervalArray is just a class containing 2 columns: .left and .right. So it should be easy to convert into a datatable Frame with 2 columns. Admittedly, a better solution would be to implement the "struct" type in datatable, in which case IntervalArray could be converted into a struct column.
For general ExtensionTypes in pandas, they are supposed to implement the __arrow_array__() method, and thus we can first transform a column into an Arrow array, and then back into datatable.
Struct-type support could also enable direct interop with PySpark :)