bcdata icon indicating copy to clipboard operation
bcdata copied to clipboard

cql_translate fixes for updated dbplyr

Open ateucher opened this issue 1 year ago • 5 comments

Partially working but I still think it's more complex than need be and the built-in DBI/dbplyr function mapping is not working quite right...

#304

ateucher avatar Sep 08 '22 01:09 ateucher

I think we need to make a somewhat major change in how we parse user input into filter() to make this work smoothly. Right now we have some ugly code that tries to detect parts of a filter call that should be evaluated locally (e.g., a call to an sf function) and then modifying the call by wrapping it in local(). partial_eval() then processes that call and forces that evaluation of the local() bits before turning the call into a CQL string. But this is fragile and can't reliably detect all cases. The recent change to partial_eval has made it stricter and it now throws errors on unevaluated calls that should have been evaluated and vice-versa.

The existing approach in bcdata actually goes against the way dbplyr was designed to be used - basically we should be asking the user to wrap parts of calls that should be evaluated locally in local(), rather than trying to guess and do it for them.

I propose we get rid of the stuff in cql_translate() that tries to figure out what should be locally evaluated, and ask the user to do it specifically. This is likely to break some people's code, so we will need to think about how to communicate the change and help people adjust.

Essentially, instead of something like:

bcdc_query_geodata("a-record") |>
  filter(OBJ_AREA_SQM > st_area(a_local_sf_object)) # OBJ_AREA_SQM is a field in the wfs object

would need to be something like:

bcdc_query_geodata("a-record") |>
  filter(OBJ_AREA_SQM > local(st_area(a_local_sf_object))) # OBJ_AREA_SQM is a field in the wfs object

Thoughts @boshek @stephhazlitt ?

ateucher avatar Sep 08 '22 21:09 ateucher

@boshek @stephhazlitt I've made the change. Everything passes nicely locally with the dev version of dbplyr. Haven't started on the documentation yet. The updates to the test files illustrate the user-facing changes.

ateucher avatar Sep 09 '22 16:09 ateucher

This looks great @ateucher 🙏. I think it is a fair trade off re: requiring users to use local() in return for cleaner, more predictable+robust code. Pretty sure we have some use cases in the vignettes that will need updating with this change. We could think about some active advertising of the change, e.g. a tweet thread? Maybe even a new, short pkgdown article?

stephhazlitt avatar Sep 09 '22 22:09 stephhazlitt

Thanks @stephhazlitt. I think that the vignettes are actually ok... I did a manual check and precompiled them with the dev version of bcdata and they worked. I think both your communication ideas are good - a tweet thread, and either a new vignette or just add a section to the efficiently_query_geospatial_data vignette.

ateucher avatar Sep 10 '22 00:09 ateucher

Ok, I've updated one of the vignettes with a simple example... let me know what you think. Unrelated, but I finally figured out how to clean up those spurious dbQuoteIdentifier and dbQuoteString messages we've been getting (https://github.com/bcgov/bcdata/pull/305/commits/92146dbcc75028602a59ab356ba43bb67b262adf)

ateucher avatar Sep 13 '22 00:09 ateucher

/precompile

ateucher avatar Oct 25 '22 05:10 ateucher

Good catch - and it's awful:

library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(bcdata)
#> 
#> Attaching package: 'bcdata'
#> The following object is masked from 'package:stats':
#> 
#>     filter

the_geom <- st_sfc(st_point(c(1164434, 368738)),
                   st_point(c(1203023, 412959)),
                   crs = 3005)

bcdc_query_geodata("local-and-regional-greenspaces") %>%
  filter(DWITHIN(st_buffer(the_geom, 10000, nQuadSegs = 2), 100, "meters"))
#> Authorizing with your stored API key
#> Warning: It is advised to use the permanent id ('6a2fea1b-0cc4-4fc2-8017-eaf755d516da') rather than the name of the record ('local-and-regional-greenspaces') to guard against future name changes.
#> Warning: Named arguments ignored for SQL st_buffer
#> Error in UseMethod("escape"): no applicable method for 'escape' applied to an object of class "c('sfc_POINT', 'sfc')"

Created on 2022-10-25 with reprex v2.0.2

It's really deep down in the stack and trying to preempt it might be really tricky. I think it might be best to just use a try and create a custom error if the error looks like "no applicable method for 'escape' applied to..."

ateucher avatar Oct 25 '22 16:10 ateucher

Yeah that's a great thought. I think that's a good idea and we can see how that works.

boshek avatar Oct 25 '22 16:10 boshek

Thanks @stephhazlitt and @boshek!

ateucher avatar Oct 25 '22 21:10 ateucher