geo icon indicating copy to clipboard operation
geo copied to clipboard

Merge geo-types and geo crate

Open urschrei opened this issue 1 year ago • 6 comments

Splitting the types and algorithms into separate crates made sense at one point, but is now an ongoing source of annoyance, bad DX, technical debt, and frustrating papercuts for crate users and geo's developers because implementing methods from geo on geo-types structs in third-party crates such as rstar is not possible due to circular dependencies. This has led to the proliferation of things like secret modules and the complete impossibility of implementing full spatial indexing on e.g. Polygons: https://github.com/georust/geo/pull/984.

I've thought about it, and I've concluded that we have no evidence that there are users of geo-types who don't also use geo, and that the proliferating need for spatial indexing in geo (for prepared geometries, unary unions to name but a few) is important to the crate and ecosystem's ongoing growth.

I propose putting a plan and schedule in place to merge the crates as a priority.

urschrei avatar May 19 '24 08:05 urschrei

I've thought about this too. It might be a good idea, but it would be a pretty big change, so I wanted to flesh out the pros and cons a bit more.

The axiom is that it's useful to provide a way for geospatial crates like https://github.com/tmontaigu/shapefile-rs/blob/master/Cargo.toml and https://github.com/georust/geojson to interop with eachother and the algorithms in geo.

So I'm assuming we still see value in providing interop, but it would now live in geo instead of geo_types.

Cons against merging geo/geo-types

I see two primary benefits to maintaining the current separation of geo-types and geo (cons against merging).

Con 1. currently minimal deps are inherited by third party crates

Keeping geo-types minimal encourages crate developers to integrate with it, without greatly increasing their own dependency chain (and thus their user's build times and binary sizes).

Mitigation proposal: Put all the algorithms behind a new --features="algorithms" and have --no-default-features build only the types. Encourage third party crate integrators to use --no-default-features when possible.

Con 2. currently semver is relatively stable

Keeping geo-types relatively sem-ver stable keeps crate authors from having to spend too much time on the upgrade treadmill. If we merged the two, geo-types would break as often as geo.

For context, the last breaking release of geo-types was Jan 2021. We've had 11 breaking release of geo in that timeframe. That metric is a bit exaggerated. I think because we have geo-types separated, we are a bit fearless about breaking geo semver. We could be a bit more conservative if we saw value in it, but I think the point stands. (Personally, this is my biggest concern with merging the two.)

I've concluded that we have no evidence that there are users of geo-types who don't also use geo,

There are definitely plenty of libraries that use geo-types without using geo (geozero, geojson, wkt, shapefile-rs), but I think what you mean is, at the end of the day, users of those libraries are all (or mostly all) using geo somewhere else in their codebase, so there's no net reduction in dependencies for the end user. Is that right?

I'd wager that's true for most users. I'm not sure how to measure the significance of that remainder though.

Pros of merging geo/geo-types

  1. third party integration's that rely on algorithms get easier. This means we can build better features like rstar integration that just works for folks.
  2. I guess release mechanics would be simpler, but honestly this isn't a big burden for me as a maintainer. And I don't think it affects end users much.

Alternatives Considered

new type wrappers in geo

Keep the separate crates, and add new type wrappers for geo_type::geometries into geo. Then we could easily implement the traits (e.g. rstar) on the new types in geo with access to the algorithms. The problem though is we now have geo::Point and geo_types::Point which are not actually the same thing and will likely lead to hard to understand compilation errors.

"secret" modules

We can continue to move shared algorithm code needed for third party trait integration from geo into geo_types::private_utils, but it starts to sabotage the purported benefits of the separation.

michaelkirk avatar May 20 '24 19:05 michaelkirk

Wild idea that would maintain the current low number of breaking releases for crates that only need the types for interop and don't care about the algorithms:

What if we:

  1. moved the geo_types geometry definitions into geo as the issue proposes.
  2. Put algorithms behind a new algorithms feature (enabled by default), such that cargo build geo --no-default-features builds only the types (no algorithms).
  3. Made the geo_types crate a simple re-export of those types. It's kind of a funny inversion - whereas currently geo gets its geometries from geo_types, now geo_types would get its geometries from geo.

Then I think we could continue to freely break the semver for geo, and even update geo-types's geo dependency to the latest geo release, but we'd only need to actually break geo-types semver if we actually change the geometry format, since that's the only thing it re-exports.

I haven't thought about it too hard... but it seems like it'd address my concern about keeping a relatively stable interop format while allowing third party integrations to take advantage of algorithms.

michaelkirk avatar May 20 '24 20:05 michaelkirk

Mitigation proposal: Put all the algorithms behind a new --features="algorithms" and have --no-default-features build only the types. Encourage third party crate integrators to use --no-default-features when possible.

This is a great idea!

For context, the last breaking release of geo-types was Jan 2021. We've had 11 breaking release of geo in that timeframe. That metric is a bit exaggerated. I think because we have geo-types separated, we are a bit fearless about breaking geo semver. We could be a bit more conservative if we saw value in it, but I think the point stands. (Personally, this is my biggest concern with merging the two.)

These are semver-breaking, but in practice only require a small amount of work (if any) to upgrade (I realise that's more than "none", but I think it's important to note)

There are definitely plenty of libraries that use geo-types without using geo (geozero, geojson, wkt, shapefile-rs), but I think what you mean is, at the end of the day, users of those libraries are all (or mostly all) using geo somewhere else in their codebase, so there's no net reduction in dependencies for the end user. Is that right?

Yep, I meant that there are very few libraries that only use -types.

Keep the separate crates, and add new type wrappers for geo_type::geometries into geo. Then we could easily implement the traits (e.g. rstar) on the new types in geo with access to the algorithms. The problem though is we now have geo::Point and geo_types::Point which are not actually the same thing and will likely lead to hard to understand comilation errors.

I think this would be a big source of annoyance for crate consumers – in my / our experience this kind of incompatibility trips people up often, and the compiler error messages don't help you to fix it. Of course it's easy for us to point people in the right direction, but that means they have to tell us, someone has to look at the code etc.

We can continue to move shared algorithm code needed for third party trait integration from geo into geo_types::private_utils, but it starts to sabotage the purported benefits of the separation.

One of the reasons I opened this issue is that I don't think we can, particularly in the case of the various euclidean distance measures. There's way too much scaffolding code that makes it work; I think we've hit the limit of what can be practically duplicated, and the duplication we have now is already too extensive in my opinion – you have to know the geo codebase well to even know that it exists, which is a barrier to people who want to help with any features that are implemented on geo-types geometries but require geo code.

urschrei avatar May 20 '24 20:05 urschrei

Made the geo_types crate a simple re-export of those types. It's kind of a funny inversion - whereas currently geo gets its geometries from geo_types, now geo_types would get its geometries from geo.

This would be amazing if we could make it work!

urschrei avatar May 20 '24 20:05 urschrei

These are semver-breaking, but in practice only require a small amount of work (if any) to upgrade (I realise that's more than "none", but I think it's important to note)

Agreed, the changes required are typically trival. I think for actively maintained libraries, it's a rounding error.

My bigger concern is for the long tail of less actively developed libraries that we'll be fracturing off from the ecosystem every time we do a geo-types release. Multiple times a year is a lot!

michaelkirk avatar May 20 '24 20:05 michaelkirk

Made the geo_types crate a simple re-export of those types. It's kind of a funny inversion - whereas currently geo gets its geometries from geo_types, now geo_types would get its geometries from geo.

This would be amazing if we could make it work!

I made a start at this here: https://github.com/georust/geo/compare/mkirk/merge-geo-types-3?expand=1

I've done the fundamental reorganization as described, and I can get geo, geo-types, and a forked wkt (which depends on this new geo-types) to build.

I've hit some difficulty, so I'm going to write up my understanding of where things lie.

The error happens when building tests:

error[E0308]: mismatched types
   --> geo/src/algorithm/concave_hull.rs:376:47
    |
376 |         let norway_concave_hull: LineString = geo_test_fixtures::norway_concave_hull::<f64>();
    |                                  ----------   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `geometry::geometry::line_string::LineString`, found `geo::geometry::geometry::line_string::LineString`
    |                                  |
    |                                  expected due to this
    |
note: the crate `geo` is compiled multiple times, possibly with different configurations
   --> geo/src/geometry/geometry/line_string.rs:134:1
    |
134 | pub struct LineString<T: CoordNum = f64>(pub Vec<Coord<T>>);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this is the expected type `geometry::geometry::line_string::LineString`
    |
   ::: /Users/mkirk/src/georust/geo/geo/src/geometry/geometry/line_string.rs:134:1
    |
134 | pub struct LineString<T: CoordNum = f64>(pub Vec<Coord<T>>);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this is the found type `geo::geometry::geometry::line_string::LineString`
    |

The "expected LineString, but found LineString" error is similar to what you'd see when you have two simultaneous versions of a crate included by conflicting versioning requirements. Though the types have the same name and structure, because they come from different compilation targets, they are distinct.

In our case, with cargo tree I can verify that all builds of geo are from the same local version:

$ cargo tree | grep "geo " 
geo v0.30.0 (/Users/mkirk/src/georust/geo/geo)
│   │   └── geo v0.30.0 (/Users/mkirk/src/georust/geo/geo) (*)
│   ├── geo v0.30.0 (/Users/mkirk/src/georust/geo/geo) (*)
├── geo v0.30.0 (/Users/mkirk/src/georust/geo/geo) (*)

So, same versions — our case is a different problem.

My understanding of the error is we have something like a circular dependency. Typically a "circular dependency error", would be explicit, and look more like this:

error: cyclic package dependency: package geo v0.30.0 (/Users/mkirk/src/georust/geo/geo) depends on itself.

Unlike one of those cyclic package dependency errors, because the thing that depends on geo in our case case is a dev dependency (used by unit tests), geo is actually being built in two different configurations — which is allowed.

The first build of geo is as the library under test which includes the library code and the unit tests. The second build of geo is as a normal (non-test) library, since it's required by dependencies (via geo-types).

In my branch, an example of the second build case is:

geo.devDependencies includes wkt wkt depends on geo-types geo-types depends on geo

Similarly geo_test_fixtures, and jts_test_runner are devDependencies that depend on geo and produce similar errors.

This might be something we can figure out, and personally, I'm pretty flexible when it comes to doing "weird stuff" internally if needed to get our build working. However, I'm less flexible when it comes to requiring downstream users to do weird acrobatics.

I'm not sure yet what's possible, I wanted to copy down the current state of things both as a note to myself and in case anyone else has ideas.

michaelkirk avatar Jun 30 '25 20:06 michaelkirk