narwhals icon indicating copy to clipboard operation
narwhals copied to clipboard

[Enh]: About spatial data support

Open curtis18 opened this issue 4 months ago • 3 comments

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

Is it possible to support a unified geometry dtype and API that works across GeoPandas, GeoPolars, DuckDB Spatial, and GeoSpark? Thank you.

Please describe the purpose of the new feature or describe the problem to solve.

This can convert between geospatial objects, handle inconsistent coordinate reference system and spatial joins, etc.

Suggest a solution if possible.

No response

If you have tried alternatives, please describe them below.

No response

Additional information that may help us understand your needs.

No response

curtis18 avatar Aug 30 '25 12:08 curtis18

hey @curtis18

i was actually thinking about how cool something like this would be yesterday 😄

i think it would need to be a narwhals plugin or something like that, which we'd take care of keeping in sync

I like the idea, but won't have bandwidth to explore this until the end of the year (at least)

if anyone wants to put a proposal forwards, happy to discuss

MarcoGorelli avatar Aug 30 '25 12:08 MarcoGorelli

This is something I've also been thinking about in the back of my mind over the last year or so, I think it would be really quite valuable. From my (biased) perspective as a non GIS domain expert but maintainer of GeoPandas for several years now, there's historically been quite limited options for tabular GIS from python; either geopandas or writing raw SQL against Postgres / spatialite are the two that come to mind. That's changed in the last little while, there's now:

  • duckdb-spatial
  • geopolars - has been awaiting arrow extension metadata support in polars which is recently unblocked
  • polars-st - similar in spirit to geopolars, but using EWKB to embed extension metdata to avoid being blocked
  • sedona-db

which all to an extent do (or will) provide compute engines capable of executing geospatial queries. However existing libraries e.g. pysal or any of the closed source code I maintain in my work are written specifically against a single existing library, typically GeoPandas.

I just want to solicit some thoughts on what form a useful contribution might look like in this space. I've been doing a little proof of concept testing just directly as a branch of narwhals, taking geopandas and duckdb-spatial as test cases adding a geo namespace and am fairly confident that could work.

But I wasn't sure if this was worth fleshing out much past the point of feasible, I suspect it might be worth thinking about the plugin relationship sooner than later. I wanted to check if there are any examples of plugins that work in this way i.e. providing an "extended compliant" definition / Expr with additional methods/ namespace rather than the case described in the docs around supporting ingesting other dataframe varieties into narwhals (though this could be needed too in certain geospatial cases). If not there's perhaps a bit to figure out how to do this nicely.

Although it seems like it could be simpler to implement geospatial support directly in narwhals, that seems a bit problematic for a few reasons (which maybe I'm wrong about!);

  • it seems to run contrary to trying to keep this library small and lightweight
  • potential lack of API stability. I know narwhals is making perfect backwards compatibility promises, that doesn't seem wise in the geospatial setting when there is no single polars analogue to derive expressions from. (Maybe geopolars and polars_st end up with the same API, or one ends up vastly more popular but until that happens there's ambiguity).

m-richards avatar Nov 23 '25 10:11 m-richards

thanks for your thoughts!

yeah i think this should be a separate package, but one that we could still host under the narwhals-dev org

MarcoGorelli avatar Nov 23 '25 11:11 MarcoGorelli