bcdata icon indicating copy to clipboard operation
bcdata copied to clipboard

Filter using string partial matching

Open bevingtona opened this issue 5 years ago • 7 comments

Problem: I'd like to filter a query but I only know part of the string I am looking for...

bcdata::bcdc_query_geodata("bc-parks-ecological-reserves-and-protected-areas") %>%
  filter(stringr::str_detect(PROTECTED_LANDS_NAME, "SUGARBOWL.*PARK*"))

#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)): object 'PROTECTED_LANDS_NAME' not found

Potential solution: I could download/collect only the column that I want to filter by, detect the sting that I am looking for and then use that string to filter my query.

my_var <- bcdc_query_geodata("bc-parks-ecological-reserves-and-protected-areas") %>%
  collect() %>% 
  st_drop_geometry() %>% 
  select(PROTECTED_LANDS_NAME) %>% 
  filter(stringr::str_detect(PROTECTED_LANDS_NAME, "SUGARBOWL.*PARK")) %>% 
  pull()

bcdc_query_geodata("bc-parks-ecological-reserves-and-protected-areas") %>%
  filter(PROTECTED_LANDS_NAME == my_var)

Problem with this solution: I need to download the geometries of the data set to get the column that I am interested in, which is essentially just downloading the entire dataset, so there is no processing speed advantage.

My question: Is there a way to drop the geometry before collecting the data? I suspect that this would be very fast and efficient for filtering large datasets using string detection.

bevingtona avatar Oct 08 '20 21:10 bevingtona

@webgismd do you know if the WFS can return only the attributes and leave the geometry behind?

boshek avatar Oct 08 '20 22:10 boshek

I also wonder if you can partially match a string CQL?

ateucher avatar Oct 09 '20 14:10 ateucher

I also wonder if you can partially match a string CQL?

@ateucher like this? solr_query=title: sugar but no idean about how to integrate this :) https://docs.datastax.com/en/dse/6.0/cql/cql/cql_using/search_index/queryTerms.html

bevingtona avatar Oct 09 '20 15:10 bevingtona

So this works: https://openmaps.gov.bc.ca/geo/pub/wfs/?SERVICE=WFS&VERSION=2.0.0&REQUEST=GetFeature&outputFormat=application%2Fjson&typeNames=WHSE_TANTALIS.TA_PARK_ECORES_PA_SVW&SRSNAME=EPSG%3A3005&CQL_FILTER=PROTECTED_LANDS_NAME%20like%20%27%25BEAUMONT%25%27

and dbplyr can translate %like% :

dbplyr::translate_sql(PROTECTED_LANDS_NAME %like% "%BEAUMONT%")
#> <SQL> `PROTECTED_LANDS_NAME` like '%BEAUMONT%'

I think there is a path here.

boshek avatar Oct 09 '20 16:10 boshek

This also works: https://openmaps.gov.bc.ca/geo/pub/wfs/?SERVICE=WFS&VERSION=2.0.0&REQUEST=GetFeature&outputFormat=application/json&typeNames=WHSE_TANTALIS.TA_PARK_ECORES_PA_SVW&SRSNAME=EPSG:3005&CQL_FILTER=(strMatches(PROTECTED_LANDS_NAME, 'SUGARBOWL.*') = true)

Using strMatches from here: https://docs.geoserver.org/stable/en/user/filter/function_reference.html#string-functions.

But this doesn't work:

bcdata::bcdc_query_geodata("bc-parks-ecological-reserves-and-protected-areas") %>%
  filter(CQL("strMatches(PROTECTED_LANDS_NAME, 'SUGARBOWL.*') = 'true'"))

I think there's an issue sending as a POST vs GET request...

ateucher avatar Oct 09 '20 16:10 ateucher

Using strMatches we could make a grepl shim if we can figure out how to issue the query via POST. It might need to be added to the url rather than as a query parameter

ateucher avatar Oct 09 '20 16:10 ateucher

I changed the title of this issue since it's not actually possible to get just the attributes of a wfs layer...

ateucher avatar Apr 21 '21 23:04 ateucher