ibis
                                
                                
                                
                                    ibis copied to clipboard
                            
                            
                            
                        feat(api): API for converting binary columns to geometry columns
Is your feature request related to a problem?
https://ibis-project.zulipchat.com/#narrow/stream/405265-tech-support/topic/spatial.20column.20woes
This works using .sql but that's only a workaround, we should have an API for this.
Describe the solution you'd like
A way to either:
t.geometry.cast("geometry")t.geometry.to_geometry()(API TBD)
Or both. Perhaps the latter can be used when casting.
What version of ibis are you running?
main
What backend(s) are you using, if any?
DuckDB
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
 
just a note that we can currently hack around this only by using raw_sql:
import ibis
from ibis import _
from shapely.geometry import box
bounds = box(-2493045.0, 176655.0, 2343105.0, 3310995.0)
parquet = "https://data.source.coop/cboettig/pad-us-3/pad-us3-combined.parquet"
con = ibis.duckdb.connect()
con.load_extension("spatial")
# little ick:
con.raw_sql(f"CREATE VIEW pad AS SELECT *, st_geomfromwkb(geometry) as geom from read_parquet('{parquet}')")
pad = con.table("pad")
but now this is fast:
%%time
# testing
(pad.
    filter(_.geom.within(bounds)).
    group_by([_.Mang_Type]).
    aggregate(n = _.count()).
    to_pandas()
)
Wall time: 14.8 s
Compare this to opening identical data as flatgeobuf in duckdb:
%%time
fgb = "https://data.source.coop/cboettig/pad-us-3/pad-us3-combined.fgb"
(con.read_geo(fgb).
    filter(_.geom.within(bounds)).
    group_by([_.Mang_Type]).
    aggregate(n = _.count()).
    to_pandas()
)
Wall time: 4min 10s
t.geometry.to_geometry()(API TBD)
Small comment - I think we should reserve to_* methods for things like to_parquet/to_csv/... that create output artifacts and run eagerly.
Ah good point. I wonder if as_geo() would be preferable? Or perhaps as_geometry()?
I'm not sure if it makes sense to elevate this to a top-level method on Binary columns? In our common API we have as_table/as_scalar, which handle shape but not type conversions. The geometry API itself does have a few as_* methods, so there is some precedent there. Personally I find the .cast("geometry") syntax to be the one that feels the most "right". If you want a method, I have a slight preference for as_geometry unless we make the dtype parser also accept geo for cast("geo").
+1 for doing this with only cast for now
:bangbang: :rocket: :tada: :magic_wand: