geoarrow-rs icon indicating copy to clipboard operation
geoarrow-rs copied to clipboard

Discussions towards better stability of core pieces of geoarrow-rs

Open kylebarron opened this issue 8 months ago • 2 comments

From https://github.com/geoarrow/geoarrow-rs/issues/1015#issuecomment-2776800398

I think there are a few problems with geoarrow-rs:

  • there's a wide variety of code at different states of production-readiness
  • Partly due to me trying to do too much, struggling a bit with how best to model GeoArrow in general, and also learning Rust through this whole process, there's a whole lot of code that is decidedly not production ready.
  • Because it's all one Rust crate geoarrow, there's no clear lines between what is (closer to) production-ready and what is not.

I think a way to break through this impasse is to select relatively small, well-defined subsets of GeoArrow functionality and break them into subcrates. For one, this forces more thought about public APIs because across crates you can't access any pub(crate) attributes. It lets us more clearly document which subsets we expect to be more stable and tested. And external users like yourself can start to build on only those pieces without even bringing in the dependencies for the full geoarrow crate.

In a spectrum of more stable to less stable

  • Core types conforming to the spec, like what is now in geoarrow-schema
  • "primitive" Array layouts like Point/LineString etc
  • "complex" Array layouts like Geometry and GeometryCollection
  • Array builders
  • Conversions between GeoArrow memory and geo, WKB, and WKT
  • Reading/writing Parquet
  • Reading/writing FlatGeobuf
  • Chunked arrays (should maybe remove)
  • Table concept (should probably remove)
  • Conversions between GeoArrow memory and geos
  • Geometry operations using geo
  • Casting
  • Geometry operations using geos
  • Reading/writing other geo formats
  • Reading/writing to PostGIS

Is there a well-defined subset of this project that you think you would use if it were more stable? Is there a piece that you're interested in that we could work on together to make stable?

Originally posted by @kylebarron in https://github.com/geoarrow/geoarrow-rs/issues/1015#issuecomment-2776800398

cc @paleolimbot

kylebarron avatar Apr 03 '25 20:04 kylebarron

First, the whole geoarrow crate (and the ecosystem adoption it's largely behind) is awesome and any of my gripes should be taken with a grain of salt. The least useful thing I'll say is that all of these things are things that eventually should be enabled!

I think the absolutely essential bits are geoarrow-schema (mostly done!), iterate over by geo-traits (pretty sure this is somewhere), and build by buffer + validate (substantially easier than an arbitrary builder, I think).

I'm definitely happy to contribute some of these pieces although I'm not exactly sure of the timeline. I'm always happy to review, though!

paleolimbot avatar Apr 03 '25 20:04 paleolimbot

One thing that may be worth considering is building the pieces up (e.g., geoarrow-schema, geoarrow-array, etc.) without refactoring geoarrow as you go. That would allow breaking changes if they're needed to scale back the scope and perhaps be a bit more fun (but maybe it's not bad to refactor as we go!)

paleolimbot avatar Apr 03 '25 20:04 paleolimbot

As described in https://github.com/geoarrow/geoarrow-rs/pull/1097, the old geoarrow crate is being refactored into a monorepo of smaller crates.

I think this issue can be closed now.

kylebarron avatar May 14 '25 19:05 kylebarron