rtweet icon indicating copy to clipboard operation
rtweet copied to clipboard

Implement alternative geocoder

Open katrinleinweber opened this issue 4 years ago • 5 comments

Problem

lookup_coords currently relies on Google Maps and a few hard-coded coordinates (#261).

Expected behavior

Since there are several other options, would it be feasible to implementing at least one of those, so that people can choose not to be dependent on Google?

katrinleinweber avatar Oct 30 '19 15:10 katrinleinweber

What's the best R package for geocoding?

hadley avatar Feb 27 '21 13:02 hadley

I would say https://cran.r-project.org/web/packages/tidygeocoder/index.html although I haven’t use it extensively

EDIT

~~I am not sure if returns the bounding box, that is required in the coords object and probably in the Twitter API~~

EDIT2

It can with geo(full_results = TRUE), seems to be a good option. Imports tibble, dplyr, httr, jsonlite

dieghernan avatar Feb 27 '21 14:02 dieghernan

This looks like a good candidate

https://docs.ropensci.org/opencage/

EDIT: No Google Maps support, dedicated to a single provider

dieghernan avatar Feb 27 '21 18:02 dieghernan

OK, so I have doing some research and I have found some interesting things that may impact some of the issues related with lookup_coords:

1. Google API does not always return bounds

This is described in the API docs: https://developers.google.com/maps/documentation/geocoding/overview#results

bounds (optionally returned) stores the bounding box which can fully contain the returned result.

This is the variable used on lookup_coors:

https://github.com/ropensci/rtweet/blob/f45b9b3e20275aef6171f6f109ab6e2dba89aa7c/R/coords.R#L122-L126

Potential alternative: Using viewport variable. From the API docs:

viewport contains the recommended viewport for displaying the returned result, specified as two latitude,longitude values defining the southwest and northeast corner of the viewport bounding box. Generally the viewport is used to frame a result when displaying it to a user.

See an example that returns both bounds and viewport: https://developers-dot-devsite-v2-prod.appspot.com/maps/documentation/utils/geocoder#q%3DUnited%2520States%2520of%2520America

Note the difference between the viewport (mainland USA) vs bounds (including also Hawaii, Alaska, etc), that is exactly what these lines try to do:

https://github.com/ropensci/rtweet/blob/f45b9b3e20275aef6171f6f109ab6e2dba89aa7c/R/coords.R#L61-L72

Find an example of a query not returning bounds. This seems the case for narrower searchs (zoom out to see the viewport, blue line): https://developers-dot-devsite-v2-prod.appspot.com/maps/documentation/utils/geocoder#q%3DTimes%2520Square%2520NY

2. Alternatives

a. Custom function (as fallback/replacement)

On #391 I added an alternative for geocoding using Nominatim, that does not require API Key and seems to be reliable enough:

lookup_coords_nominatim <- function(address, ...) {
  if (missing(address)) stop("must supply address", call. = FALSE)
  stopifnot(is.atomic(address))
  place <- address
  if (grepl("^us$|^usa$|^united states$|^u\\.s",
    address,
    ignore.case = TRUE
  )) {
    boxp <- c(
      sw.lng = -124.848974,
      sw.lat = 24.396308,
      ne.lng = -66.885444,
      ne.lat = 49.384358
    )
    point <- c(
      lat = 36.89,
      lng = -95.867
    )
  } else if (grepl("^world$|^all$|^globe$|^earth$",
    address,
    ignore.case = TRUE
  )) {
    boxp <- c(
      sw.lng = -180,
      sw.lat = -90,
      ne.lng = 180,
      ne.lat = 90
    )
    point <- c(
      lat = 0,
      lng = 0
    )
  } else {
    ## encode address
    address <- gsub(" ", "+",  address)
    ## compose query
    params <- list(
      q = address,
      format = "json",
      limit = 1
    )
    params <- params[!vapply(params, is.null, logical(1))]
    params <- paste0(
      mapply(
        function(x, y) paste0(x, "=", y),
        names(params), params
      ),
      collapse = "&"
    )
    ## build URL - final name in English
    geourl <- paste0(
      "https://nominatim.openstreetmap.org/search?",
      params,
      "&accept-language=en"
    )
    ## read and convert to list obj
    r <- jsonlite::fromJSON(geourl)
    ## extract and name box and point data frames
    bbox <- as.double(unlist(r$boundingbox))
    boxp <- c(
      sw.lng = bbox[3],
      sw.lat = bbox[1],
      ne.lng = bbox[4],
      ne.lat = bbox[2]
    )
    point <- c(
      lat = as.double(r$lat),
      lng = as.double(r$lon)
    )
    # Full name from Nominatim
    place <- r$display_name
  }
  rtweet:::as.coords(place = place, box = boxp, point = point) # call an internal function
}

b. Using a geocoding package

Following @hadley suggestion, I did some research (and a call to rspatial comumunity on Twitter, https://twitter.com/dhernangomez/status/1365676793299148803?s=20) and so far it seems to me that https://github.com/jessecambon/tidygeocoder could be the best alternative for the {rtweet} package if this is the preferred way forward.

The function geo allows the user to use several geocoders (including Google and Nominatim), and would be easily implemented. Some adjustments to the environment variables of both packages would be neccesary.

Update: {tidygeocoder} v1.0.3 now supports 12 geocoding services, including all the majors: see https://jessecambon.github.io/tidygeocoder/articles/geocoder_services.html. At least OSM and ArcGIS have global coverage without the need of an API Key, ping @jessecambon

3. Bottom line

I think there are ways to improve this function (using viewport, moving to another free geocoders, fallbacks, using another packages...) but I am not sure if this is a priority right now for {rtweet}.

I would be happy to help if needed, but it seems to me that it would require some work so by now I would leave it as is. If you want me to help just ping me!

dieghernan avatar Mar 01 '21 12:03 dieghernan

Yeah, it is not a priority, so I leave for a while as is. I lend towards Nominatim, the one from Open Street Map, not sure which package would be better, but when we set on this we'll discuss it.

llrs avatar Mar 01 '21 13:03 llrs