ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat(api): API for converting binary columns to geometry columns

Open cpcloud opened this issue 1 year ago • 5 comments

Is your feature request related to a problem?

https://ibis-project.zulipchat.com/#narrow/stream/405265-tech-support/topic/spatial.20column.20woes

This works using .sql but that's only a workaround, we should have an API for this.

Describe the solution you'd like

A way to either:

  • t.geometry.cast("geometry")
  • t.geometry.to_geometry() (API TBD)

Or both. Perhaps the latter can be used when casting.

What version of ibis are you running?

main

What backend(s) are you using, if any?

DuckDB

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

cpcloud avatar Mar 27 '24 21:03 cpcloud

just a note that we can currently hack around this only by using raw_sql:

import ibis
from ibis import _
from shapely.geometry import box
bounds = box(-2493045.0, 176655.0, 2343105.0, 3310995.0)

parquet = "https://data.source.coop/cboettig/pad-us-3/pad-us3-combined.parquet"
con = ibis.duckdb.connect()
con.load_extension("spatial")

# little ick:
con.raw_sql(f"CREATE VIEW pad AS SELECT *, st_geomfromwkb(geometry) as geom from read_parquet('{parquet}')")
pad = con.table("pad")

but now this is fast:

%%time
# testing
(pad.
    filter(_.geom.within(bounds)).
    group_by([_.Mang_Type]).
    aggregate(n = _.count()).
    to_pandas()
)

Wall time: 14.8 s

Compare this to opening identical data as flatgeobuf in duckdb:

%%time
fgb = "https://data.source.coop/cboettig/pad-us-3/pad-us3-combined.fgb"

(con.read_geo(fgb).
    filter(_.geom.within(bounds)).
    group_by([_.Mang_Type]).
    aggregate(n = _.count()).
    to_pandas()
)

Wall time: 4min 10s

cboettig avatar Mar 27 '24 22:03 cboettig

t.geometry.to_geometry() (API TBD)

Small comment - I think we should reserve to_* methods for things like to_parquet/to_csv/... that create output artifacts and run eagerly.

jcrist avatar Mar 29 '24 17:03 jcrist

Ah good point. I wonder if as_geo() would be preferable? Or perhaps as_geometry()?

cpcloud avatar Mar 29 '24 18:03 cpcloud

I'm not sure if it makes sense to elevate this to a top-level method on Binary columns? In our common API we have as_table/as_scalar, which handle shape but not type conversions. The geometry API itself does have a few as_* methods, so there is some precedent there. Personally I find the .cast("geometry") syntax to be the one that feels the most "right". If you want a method, I have a slight preference for as_geometry unless we make the dtype parser also accept geo for cast("geo").

jcrist avatar Mar 29 '24 18:03 jcrist

+1 for doing this with only cast for now

cpcloud avatar Mar 29 '24 19:03 cpcloud

:bangbang: :rocket: :tada: :magic_wand:

cboettig avatar Apr 30 '24 17:04 cboettig