sfdbi
sfdbi copied to clipboard
SFSQL and SFSQL_PG driver for lazy_tbl
SFSQL() wraps the basic sql query to read_sf(), passing in the DSN argument in the usual way (file, url, connection string) with a dummy layer value.
SFSQL_PG() adds another level to allow input of host,dbname,user,password which is passed down as the 'PG:...' connection string to SFSQL().
There is a lot I haven't considered, readonly is ignored, security/obfuscation on the connection string is ignored, and no detail on the difference between DB-drivers and non-DB:
i.e. DB drivers have the SQL passed in direct, so it is engine-specific, otherwise for non-DB afaik:
- must not contain subquerys - no 'SELECT * FROM (SELECT * FROM layer) table'
- can use special variables FID, OGR_GEOM_AREA (but only for filter, not select/mutate/transmute)
- automatically keeps the geometry column
_ogr_geometry_whether selected or not (see next comment)
Discussed here: https://github.com/mdsumner/RGDALSQL/issues/7
Other:
- I wanted data.frame, tbl, sf classes allowed so the Results slot uses "ANY" .
- there's an error-catch on statements that match "SELECT * (SELECT * )", hopefully to indicate that subqueries might be the problem for non-DB
Examples, first pseudocode for postgresql
library(sfdbi) ## mdsumner/sfdbi@ogrsql branch
## PSEUDOCODE
db <- dbConnect(SFSQL_PG(),
host = Sys.getenv("host"),
dbname = "databasename",
user = Sys.getenv("user"),
password = Sys.getenv("password")
)
## that is equivalent to
#db <- dbConnect(SFSQL(), "PG:host='..' dbname='..' user='..' password='..'")
library(dplyr)
tbl(db, "planes")
Examples with SHP and GPKG
gfile <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE)
sfile <- system.file("shape/nc.shp", package = "sf", mustWork = TRUE)
gpkg <- tbl(dbConnect(SFSQL(), gfile), "nc.gpkg")
gpkg %>%
filter(NAME %LIKE% 'A%') %>%
mutate(abc = 1.3) %>%
select(abc, NAME, geom) %>%
arrange(desc(NAME)) ## %>% show_query()
## note geom column *name*
shp <- tbl(dbConnect(SFSQL(), sfile), "nc")
shp %>%
filter(NAME %LIKE% 'A%') %>%
mutate(abc = 1.3) %>%
select(abc, NAME, `_ogr_geometry_`) %>%
arrange(desc(NAME)) #%>% show_query()
note that non-DB drivers will keep the geometry column whether it's in the query or not
GPKG drops 'geom' as it's not selected, but SHP keeps it, I don't have a strong opinion on this ...
gpkg %>%
filter(NAME %LIKE% 'A%') %>%
mutate(abc = 1.3) %>%
select(abc, NAME)

Thanks @mdsumner! This looks great. I'll give it a serious look in the next few days.
I went and made this PR into a package, so you can either use that or crib the DBI stuff from it when you're ready:
https://github.com/mdsumner/lazysf
If I get more into DBs I might come back and look at the more general case here, but rn I need this to get on with more general outputs from the lazy query.