geoarrow-rs
                                
                                 geoarrow-rs copied to clipboard
                                
                                    geoarrow-rs copied to clipboard
                            
                            
                            
                        Explode and overline
Starting with the high level ask, we have a function that does the following:
import geopandas as gpd
from shapely.geometry import LineString
sl = gpd.GeoDataFrame(
    {
        "id": ["a", "b"],
        "value": [1, 2]
    },
    geometry = [
        LineString([(0, 0), (1, 1), (2, 2)]),
        LineString([(0, 1), (1, 1), (2, 2)])
    ]
)
# plot with values:
sl.plot(column = "value")
# The output should be along the lines of:
sl_overline = gpd.GeoDataFrame(
    {
        "value": [1, 2, 3]
    },
    geometry = [
        LineString([(0, 0), (1, 1)]),
        LineString([(0, 1), (1, 1)]),
        LineString([(1, 1), (2, 2)])
    ]
)
sl_overline.plot(column = "value")
That may seem simple but there are several steps:
- [x] exploding the geometries, as implemented in native QGIS algorithm and in GeoPandas feature request: https://github.com/geopandas/geopandas/issues/2476
- [ ] re-ordering linestrings to prevent duplicate lines with different coordinate order
- [ ] aggregating the values (I suggest that this done as a separate step, possibly optionally, to give users control over aggregating functions and variables, ideally with the power of polars)
- [ ] merging the exploded linestrings into longer linestrings, that are as long as possible without aggregated values changing
- [ ] returning joined-up geometries
References
- There is a long-standing and well-used R implementation, a breakdown of which can be found here (source of the Python reproducible example above): https://github.com/Robinlovelace/overline-tests
- There is a paused attempt at a pure Rust implementation: https://github.com/acteng/overline/tree/master
- Here's an example of the outputs, detailed route networks with attribute values (in this case representing cycling potential, zoom in to see): https://dev.cruse.bike/ or https://npt.scot
Update on thinking: I don't think geoarrow-rs needs to do the whole thing: there are many options in the summarise -> aggregation step that are worth exposing to the user. Just getting the exploded+ordered linestrings alongside their attributes would be enough I think.
The merging the linestrings back together step is another bit that would massively benefit from being done here.
I don't think
geoarrow-rsneeds to do the whole thing
Yeah this was going to be a point of discussion. The goal for now is implementing as general operations as possible. So explode makes sense because it's very general. In terms of "explode segments" I'm not sure what the most general API is. We could have an explode_segments method on LineStringArray and ChunkedLineStringArray which return a LineStringArray of length-2 lines as well as indices to pass into a take operation. So something roughly like
table = GeoTable(...)
geometry = table.geometry
assert isinstance(geometry, ChunkedLineStringArray)
exploded_geometry, indices = geometry.explode_segments()
exploded_table = table.remove_geometry()[indices].add_column(exploded_geometry)
The other note is that the goal with geoarrow is to make sharing geometries across libraries zero-cost because the geometry format is ABI-stable. So a possibility is to implement some core operations in the geoarrow.rust.core Python package, but if there are other operations with a more narrow use case, have another python package like geoarrow.road_networks to do those. And each package can share data totally transparently and at zero-cost
General :+1: to the sentiment here without commenting on details..
Heads-up @wangzhao0217 who may be interested in taking a look and may have questions. Would love to take this forward and provide input to move forward on some of these tasks.
Awesome! Let me know if I can help provide pointers or anything