rtweet
rtweet copied to clipboard
Implement alternative geocoder
Problem
lookup_coords
currently relies on Google Maps and a few hard-coded coordinates (#261).
Expected behavior
Since there are several other options, would it be feasible to implementing at least one of those, so that people can choose not to be dependent on Google?
What's the best R package for geocoding?
I would say https://cran.r-project.org/web/packages/tidygeocoder/index.html although I haven’t use it extensively
EDIT
~~I am not sure if returns the bounding box, that is required in the coords
object and probably in the Twitter API~~
EDIT2
It can with geo(full_results = TRUE)
, seems to be a good option. Imports tibble, dplyr, httr, jsonlite
This looks like a good candidate
https://docs.ropensci.org/opencage/
EDIT: No Google Maps support, dedicated to a single provider
OK, so I have doing some research and I have found some interesting things that may impact some of the issues related with lookup_coords
:
1. Google API does not always return bounds
This is described in the API docs: https://developers.google.com/maps/documentation/geocoding/overview#results
bounds (optionally returned) stores the bounding box which can fully contain the returned result.
This is the variable used on lookup_coors
:
https://github.com/ropensci/rtweet/blob/f45b9b3e20275aef6171f6f109ab6e2dba89aa7c/R/coords.R#L122-L126
Potential alternative: Using viewport
variable. From the API docs:
viewport contains the recommended viewport for displaying the returned result, specified as two latitude,longitude values defining the southwest and northeast corner of the viewport bounding box. Generally the viewport is used to frame a result when displaying it to a user.
See an example that returns both bounds
and viewport
:
https://developers-dot-devsite-v2-prod.appspot.com/maps/documentation/utils/geocoder#q%3DUnited%2520States%2520of%2520America
Note the difference between the viewport (mainland USA) vs bounds (including also Hawaii, Alaska, etc), that is exactly what these lines try to do:
https://github.com/ropensci/rtweet/blob/f45b9b3e20275aef6171f6f109ab6e2dba89aa7c/R/coords.R#L61-L72
Find an example of a query not returning bounds
. This seems the case for narrower searchs (zoom out to see the viewport, blue line):
https://developers-dot-devsite-v2-prod.appspot.com/maps/documentation/utils/geocoder#q%3DTimes%2520Square%2520NY
2. Alternatives
a. Custom function (as fallback/replacement)
On #391 I added an alternative for geocoding using Nominatim, that does not require API Key and seems to be reliable enough:
lookup_coords_nominatim <- function(address, ...) {
if (missing(address)) stop("must supply address", call. = FALSE)
stopifnot(is.atomic(address))
place <- address
if (grepl("^us$|^usa$|^united states$|^u\\.s",
address,
ignore.case = TRUE
)) {
boxp <- c(
sw.lng = -124.848974,
sw.lat = 24.396308,
ne.lng = -66.885444,
ne.lat = 49.384358
)
point <- c(
lat = 36.89,
lng = -95.867
)
} else if (grepl("^world$|^all$|^globe$|^earth$",
address,
ignore.case = TRUE
)) {
boxp <- c(
sw.lng = -180,
sw.lat = -90,
ne.lng = 180,
ne.lat = 90
)
point <- c(
lat = 0,
lng = 0
)
} else {
## encode address
address <- gsub(" ", "+", address)
## compose query
params <- list(
q = address,
format = "json",
limit = 1
)
params <- params[!vapply(params, is.null, logical(1))]
params <- paste0(
mapply(
function(x, y) paste0(x, "=", y),
names(params), params
),
collapse = "&"
)
## build URL - final name in English
geourl <- paste0(
"https://nominatim.openstreetmap.org/search?",
params,
"&accept-language=en"
)
## read and convert to list obj
r <- jsonlite::fromJSON(geourl)
## extract and name box and point data frames
bbox <- as.double(unlist(r$boundingbox))
boxp <- c(
sw.lng = bbox[3],
sw.lat = bbox[1],
ne.lng = bbox[4],
ne.lat = bbox[2]
)
point <- c(
lat = as.double(r$lat),
lng = as.double(r$lon)
)
# Full name from Nominatim
place <- r$display_name
}
rtweet:::as.coords(place = place, box = boxp, point = point) # call an internal function
}
b. Using a geocoding package
Following @hadley suggestion, I did some research (and a call to rspatial comumunity on Twitter, https://twitter.com/dhernangomez/status/1365676793299148803?s=20) and so far it seems to me that https://github.com/jessecambon/tidygeocoder could be the best alternative for the {rtweet} package if this is the preferred way forward.
The function geo
allows the user to use several geocoders (including Google and Nominatim), and would be easily implemented. Some adjustments to the environment variables of both packages would be neccesary.
Update: {tidygeocoder} v1.0.3 now supports 12 geocoding services, including all the majors: see https://jessecambon.github.io/tidygeocoder/articles/geocoder_services.html. At least OSM and ArcGIS have global coverage without the need of an API Key, ping @jessecambon
3. Bottom line
I think there are ways to improve this function (using viewport
, moving to another free geocoders, fallbacks, using another packages...) but I am not sure if this is a priority right now for {rtweet}.
I would be happy to help if needed, but it seems to me that it would require some work so by now I would leave it as is. If you want me to help just ping me!
Yeah, it is not a priority, so I leave for a while as is. I lend towards Nominatim, the one from Open Street Map, not sure which package would be better, but when we set on this we'll discuss it.