h3ron icon indicating copy to clipboard operation
h3ron copied to clipboard

Build ontop of arrow Extension datatype

Open nmandery opened this issue 2 years ago • 5 comments

After some discussion with @kylebarron on the georust discord we came to the conclusion that this crate could be implemented on top of the arrow Extension datatype. Support in arrow2 appears to be finished, support in polars is still to be implemented.

nmandery avatar Oct 19 '22 18:10 nmandery

Great seeing you guys @nmandery @kylebarron working on the Rust geo ecosystem

allixender avatar Jan 28 '23 14:01 allixender

An H3Array could use an implementation similar to what I do in geoarrow, which is make a wrapper array like my PointArray

Since h3 cells can be represented as raw uint64s, you could define an h3 array as

pub struct H3Array(PrimitiveArray<u64>)

Then the From implementation could convert from a PrimitiveArray or from an extension array.

geoarrow is also relevant because your polyfill implementation could return a PolygonArray and stay in arrow memory. Maybe an arrow-efficient implementation of polyfill would first see how many pentagons exist in the polyfill output before actually running the polyfill (is that possible?) and then you'd only have to make one allocation in theory.

kylebarron avatar Jan 28 '23 18:01 kylebarron

I got to look at the more primitive arrow2 types like you are using here. Looks quite straight forward.

In the end there will probably be different H3CellArray, H3DirectedEdgeArray, ... structs to represent the different types of H3 indexes with type safety. This should also help to avoid repeated validations of the contents.

Combining that with geoarrow for everything geometry-releated is the way I want to go. The only missing thing for this currently is only time ;)

nmandery avatar Jan 29 '23 10:01 nmandery

This should also help to avoid repeated validations of the contents.

That and repeated downcasting were the main reasons I stored e.g. a PolygonArray as its constituent parts instead of directly as an arrow2::ListArray, because then you'd have to downcast on every row to access a single polygon

kylebarron avatar Jan 29 '23 16:01 kylebarron

In the meantime I started working on this arrow integration in https://github.com/nmandery/h3arrow . It is located in a new repository as it now is based on h3o instead of the H3 C library.

nmandery avatar Feb 13 '23 08:02 nmandery