osmdata icon indicating copy to clipboard operation
osmdata copied to clipboard

[FEATURE] re-design of AND versus OR queries

Open mpadge opened this issue 2 years ago • 11 comments

@Mashin6 raised valid concerns in #252, supported by @stragu, that the current design in counter-intuitive:

OR

add_osm_features (features = c (...))

AND

add_osm_feature() |>
    add_osm_feature() |> ...

This needs to be improved, for which suggestions are requested!

mpadge avatar Nov 30 '21 08:11 mpadge

Similar to what @Mashin6 said in the mentioned issue, I believe it would make a lot more sense for newcomers to:

  1. build a dataset by "getting" the results of various key-pair combinations, i.e. an OR operation, using get_osm_features(key = "xxx", value = "xxx") |> get_osm_features(key = "xxx", value = "xxx")
  2. use several key-value pairs that all need to be matched for the returned features, i.e. an AND operation, using get_osm_features(match_all = c(...))

The name get_osm_features() is chosen so it doesn't get confused with the existing add_osm_feature() and add_osm_features() functions (which could be deprecated over several osmdata versions, as the new get_osm_features() would replace them). The argument match_all leaves no space for misinterpretation.

That would fit better the way most people would read the code, replacing the pipe with "then":

  1. "get these OSM features, and then also get these OSM features, etc."
  2. "Add the OSM features that match_all these key-value pairs."

Adapting Mashin6's hypothetical example, a sample command would look like this:

opq() |>
   get_osm_features(match_all = c("natural = tree", "leaf_type = broadleaved")) |>
   get_osm_features(match_all = c("leisure = park", "name ~ public")) |>
   get_osm_features(key = "amenity", value = "bench")

... which would return an opq object that would retrieve broad leaved trees, benches, and parks with "public" in their name.

In the meantime, I'm wondering if aliasing add_osm_feature() and add_osm_features() with filter_osm_feature() and filter_osm_features() respectively would help clarifying how they behave, given that even the documentation draws a parallel with dplyr::filter() ?

stragu avatar Dec 03 '21 05:12 stragu

I'm also thinking an ellipsis argument could be used instead of a match_all, if it is clear enough that all these values will be combined with "AND". You could then simplify the same command with:

opq() |>
   get_osm_features("natural = tree", "leaf_type = broadleaved") |>
   get_osm_features("leisure = park", "name ~ public") |>
   get_osm_features(key = "amenity", value = "bench")

Edit: hmm that could easily be misinterpreted as an OR.... e.g. people trying things like get_osm_features("building = house", "building = residential")

...or should there be two unambiguous arguments match_all and match_either, so users have the option not to pipe for OR operations?

stragu avatar Dec 03 '21 06:12 stragu

Thanks @stragu, this will all start coming together in a branch after I've addressed #252. It's going to be difficult to do this well, but it is clearly pretty important, so please keep an eye on this issue and help along the way whenever you can. Thanks!

mpadge avatar Dec 03 '21 08:12 mpadge

This name change looks like a good strategy for transition.

Mashin6 avatar Dec 04 '21 06:12 Mashin6

I just played a bit around with the package some years ago - so please don't consider my thoughts as important...

Building on @stragu's comments, I was just wondering, if you could pass named arguments to the ellipsis in the style of get_osm_features(amenity = "bench", leisure = "park")

I think it's completely OK that passing multiple corresponds to a logical AND, as I'm used to this by dplyr::filter().

A similar idea was to approach it as expressions that return a logical vector like dplyr::filter(), i.e. get_osm_features(amenity == "bench" & leisure == "park"). However, that probably doesn't make sense. So I'm not sure if it makes sense to imitate this syntax...

Anyways thank you for the package! Really hope to use it more one day.

urswilke avatar Dec 09 '21 20:12 urswilke

Is there any consensus about the API or somebody working on it?

The easier thing could be to modify the existing functions and make add_osm_feature add the new filters to the existing ones in the query (paste(q$features, new_features, collapse=" ")), and add_osm_features to add new items to q$features (c(q$features, new_features)).

I can try this approach and see what else is needed to make it work.

jmaspons avatar Dec 28 '22 14:12 jmaspons

I found that:

  • add_osm_feature collapse features from add_osm_features if called first
  • if add_osm_feature is called first, features get eliminated after calling add_osm_features

I would like to make possible to chain add_osm_feature and add_osm_features regardless of the order.

jmaspons avatar Dec 28 '22 17:12 jmaspons

Got the code working at https://github.com/jmaspons/osmdata/commit/2ccf6d1ec01b76de8e3954c59d396a395a46f981

jmaspons avatar Dec 28 '22 21:12 jmaspons

Thanks, that's a useful fix. Though I don't think it addresses the main point of this issue. If someone unaware of how the insides of Overpass work would try to use add_osm_feature repeatedly, they would most likely expect retrieving more features. But currently they would receive less features with every subsequent use of that function, because it simply applies more filters.

IMO the solution would be to soft deprecate both add_osm_feature and add_osm_features for backwards compatibility and instead establish a new fuction e.g. already suggested get_osm_features.

  • Its parameters would work as filters (logical AND) get_osm_features(c(x=y, z=w, q=r))nwr[x=y][z=w][q=r];
  • And its every subsequent use would work as logical OR: get_osm_features(c(x=y, z=w, q=r)) |> get_osm_features(c(a=b, c=d))nwr[x=y][z=w][q=r]; nwr[a=b][c=d];

Mashin6 avatar Dec 29 '22 01:12 Mashin6

Yes, I agree that the current naming is not very explicit and can be improved, but the docs are quite clear. The problem I see is that the proposed solution is not as powerful, as it doesn't allow to chain features with AND or OR in arbitrary order. For example, is not possible to chain and OR followed but an AND as in

waterarea<- list(waterway='riverbank',
    landuse=c('reservoir', 'water', 'basin', 'salt_pond'),
    natural=c('lake', 'water'),
    amenity='swimming_pool',
    leisure = 'swimming_pool') # minzoom: 12
waterway_low<- list(waterway=c('river','canal')) # minzoom: 8, maxzoom: 12
placenames_small<- list(place=c('suburb','village','locality','hamlet','quarter','neighbourhood','isolated_dwelling','farm')) # minzoom: 10; maxzoom: 17
mainroad_label<- list(highway=c('primary', 'secondary', 'tertiary')) # minzoom: 12
leisure_label<- list(leisure=c('park', 'sports_centre', 'stadium', 'pitch')) # minzoom: 14

objects_osm<- c(waterarea, waterway_low, placenames_small, mainroad_label, leisure_label)

areaPPCC<- getbb("Països Catalans", format_out="data.frame")
consulta<- opq(areaPPCC, out="tags center", osm_types="nwr", timeout=500) %>%
  add_osm_features(features=objects_osm, value_exact=TRUE) %>%
  add_osm_feature(key=c("name", "!name:ca"))

With your proposal, @Mashin6, the code above would require much more verbose code and, in current main, is not possible (would require the bonus from https://github.com/jmaspons/osmdata/commit/87363103750f60c9ce3f46bcd87180d9f17d6728 , using paste_features to parse the features list also in add_osm_features )

A better solution should include different functions or a function with different parameters for AND and OR additions

jmaspons avatar Dec 29 '22 09:12 jmaspons

It's certainly an interesting use case. I don't know how often users run queries that involve adding the same set of filters to all statements. Personally would try to approach it with lapply or for loop, but having a shorthand function that does that seem like a good idea. In that case I would suggest to introduce it as a brand new function with a less confusing name e.g. filter_all_features or add_osm_filter.

Mashin6 avatar Jan 02 '23 10:01 Mashin6