WIP: array classes
Just putting up some scratch work for discussion re https://github.com/nmandery/h3ronpy/issues/55
Change list
- Export classes like
CellArray,DirectedEdgeArray, etc (names can be changed) that represent arrow arrays of h3 arrays.- This means that in theory we don't need to require pyarrow or arro3 or polars, but rather the user can use whatever python Arrow package they want.
- Implement the PyCapsule Interface directly on these arrays, so you can pass a
CellArrayintoarro3.core.Arrayorpyarrow.Arrayorpolars.Series.
@nmandery can you allow CI on this PR, or, preferably, just turn off CI approval in the settings?
It should now work without manual approval.
Ok great now the rust side compiles, and we just have to fix up the python side
The dependency updates were:
- CI was failing because it was trying to install the wrong pip wheel. I figured better to do dependency updates than change how CI is laid out.
- To restore abi3 wheels, I had to update to pyo3-arrow 0.5.1, which required numpy 0.22, which required pyo3 0.22. That then required ndarray 0.16 which required a bump of geo-interface and rasterh3
🙃
@nmandery I'll need some feedback in terms of how open you are to breaking changes. In particular, you have separate APIs for h3ronpy.polars, h3ronpy.pandas, and h3ronpy.arrow. This is really complex and hard to work through.
Now that Polars supports the PyCapsule Interface, there's simple zero-copy conversion between the two, so you can pass any Arrow array-like or chunked array-like output into the polars.Series constructor:
In my opinion this means that we should return Arrow output and let the user decide which library they want to use it with. We should document how easy it is to use in this way with polars/pyarrow/pandas etc, but only provide a single API for ease of maintenance.
Feels free to break whatever you like. The whole concept of subpackages for the different dataframe libraries was born to provide a better integration. I suppose the PyCapsule-interface makes a lot of this unnecessary, while achieving a far better integration. Additionally I feel these different subpackages are not too easy to maintain. The only thing I would like to keep is the polars extensions to the Expr and Series namespaces.
I also would prefer sticking with arrow output. +1
The only thing I would like to keep is the polars extensions to the
ExprandSeriesnamespaces.
👍 I definitely agree there
CI failed on commit 5132e01 (#64) with:
error[E0275]: overflow evaluating the requirement `<impl GeometryTrait<T = f64> as geo_traits::geometry::GeometryTrait>::GeometryCollection<'_>: geo_traits::geometry_collection::GeometryCollectionTrait`
|
= help: consider increasing the recursion limit by adding a `#![recursion_limit = "256"]` attribute to your crate (`geoarrow`)
note: required for `GeometryCollectionIterator<'_, f64, <... as GeometryCollectionTrait>::ItemType<'_>, ...>` to implement `Iterator`
--> /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/geoarrow-0.4.0-beta.1/src/geo_traits/iterator.rs:39:15
|
37 | ItemType: 'a + $item_trait<T = T>,
| ----- unsatisfied trait bound introduced here
38 | G: $self_trait<T = T, ItemType<'a> = ItemType>,
39 | > Iterator for $struct_name<'a, T, ItemType, G>
| ^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
109 | / impl_iterator!(
110 | | GeometryCollectionIterator,
111 | | GeometryCollectionTrait,
112 | | GeometryTrait,
113 | | geometry_unchecked
114 | | );
| |_- in this macro invocation
= note: required for `GeometryCollectionIterator<'_, f64, <... as GeometryCollectionTrait>::ItemType<'_>, ...>` to implement `IntoIterator`
= note: the full name for the type has been written to '/home/runner/work/h3ronpy/h3ronpy/target/x86_64-unknown-linux-gnu/release/deps/geoarrow-b79[580](https://github.com/nmandery/h3ronpy/actions/runs/11711779349/job/32621138289?pr=64#step:5:586)3b66daebd2.long-type-15103887714521775568.txt'
= note: consider using `--verbose` to print the full type name to the console
= note: the full name for the type has been written to '/home/runner/work/h3ronpy/h3ronpy/target/x86_64-unknown-linux-gnu/release/deps/geoarrow-b795803b66daebd2.long-type-15103887714521775568.txt'
= note: consider using `--verbose` to print the full type name to the console
= note: this error originates in the macro `impl_iterator` (in Nightly builds, run with -Z macro-backtrace for more info)
For more information about this error, try `rustc --explain E0275`.
which is a super annoying compiler bug (https://github.com/geoarrow/geoarrow-rs/issues/716).
Updating to the latest git of geoarrow-rs fixed compilation on my computer, even though I don't know exactly what changed to fix that bug. 🤷
@nmandery is it ok with you if the raster -specific code still has a pyarrow dependency for now? It would be nice to come back to that in a follow up so that we can unblock this and minimize the size of the PR
How do you want to handle these polars and pandas tests? Should we keep them and update them to use the current arrow-based APIs?
Lets keep the pyarrow-dependency in the raster-part for now. It will be easier to port this later on once the main work of this PR is done.
The pandas and polars-specific tests can be removed. I only distributed the tests over the different dataframe libraries to have some test-coverage for these APIs as well. I would like to keep the polars extension tests, though.
At least locally the tests are passing now. Hopefully CI will be green as well
Great. I will look through your changes
Thanks for this - I will try next to get around to fix the docs etc so we can release this. I suppose you are waiting to integrate this into lonboard.
Actually @zacharydez is working on a project that I think is using h3ronpy in AWS Lambda, and this would benefit them because pyarrow is too big to put in Lambda (along with other geospatial dependencies).
I would like to integrate this into Lonboard in some way as well, but that's not urgent.