Sedona parameterized queries
Parameterized queries were recently added to Spark and allow for some really clean syntax when shifting from the Python API to the SQL API. We should consider adding this interface to Sedona.
Status quo
Create a geometry object for Minnesota:
minnesota = 'POLYGON((-96.4517 43.5008,-91.2195 43.5017,-91.3101 43.8226,…))’
Create a DataFrame:
df = sedona.read.format("shapefile").option("charset", "UTF-8").load(user_uri + extract_dir)
Create a temporary view:
df.createOrReplaceTempView('gauges')
Run a query with a f-string:
query = sedona.sql(f'''
select geometry, Status, Waterbody, Observed
from gauges
where st_intersects(geometry, ST_GeomFromWKT('{minnesota}'))
''')
How to make this more concise with a parameterized query
This could be nicer with a parameterized query:
sedona.sql("""
select geometry, Status, Waterbody, Observed
from {df}
where st_intersects(geometry, ST_GeomFromWKT('{minnesota}'))
""", df=df, minnesota=minnesota)
The parameterized query saves you from creating the temporary view and using f-strings.
It would be great if the parameters were also sanitized to prevent SQL injection vulnerabilities.
Another parameterized query example
This can be even cooler. Assume Minnesota is a geometry object as follows:
minnesota_wkt = 'POLYGON((-96.4517 43.5008,-91.2195 43.5017,-91.3101 43.8226,…))’
minnesota = ST_GeomFromWKT(minnesota_wkt)
Then, we can pass the geometry object directly in the parameterized query:
sedona.sql("""
select geometry, Status, Waterbody, Observed
from {df}
where st_intersects(geometry, minnesota)
""", df=df, minnesota=minnesota)
nitpicking your example, it should be:
from shapely import from_wkt
minnesota = from_wkt(minnesota_wkt)
...
Some context:
- spark functions dont work outside of sql/df methods. Sedona's functions are the same
- the compat layer for geom literals in python is shapely. In java its JTS