esri2sf
esri2sf copied to clipboard
Feature request: esriIndex function to list all folders and services on ArcGIS server
I think there it may be useful to allow users to return a list of all folders and services hosted by an ArcGIS server. I came up with a function (tentatively named esriIndex although esriStructure could also work) that does this:
esriIndex <- function(url, parent = NULL, recurse = TRUE) {
urlServer <-
stringr::str_extract(url, ".+rest/services")
urlInfo <-
jsonlite::read_json(paste0(url, "?f=json"), simplifyVector = TRUE)
folders <-
tibble::tibble(
name = urlInfo[["folders"]] |> as.character(),
type = "Folder",
url = paste0(urlServer, "/", urlInfo[["folders"]]),
parent = parent
)
services <-
tibble::tibble(
name = urlInfo[["services"]][["name"]] |> as.character(),
type = urlInfo[["services"]][["type"]] |> as.character(),
url = paste0(urlServer, "/", urlInfo[["services"]][["name"]], "/", urlInfo[["services"]][["type"]], recycle0 = TRUE),
parent = parent
)
urlIndex <-
dplyr::bind_rows(
folders,
services
)
if (recurse) {
if (length(folders[["name"]]) > 0) {
urlIndex <-
dplyr::bind_rows(
urlIndex,
purrr::pmap_dfr(
folders,
~ esriIndex(url = ..3, parent = ..1)
)
)
}
}
urlIndex
}
esriIndex(url = "https://carto.nationalmap.gov/arcgis/rest/services")
#> # A tibble: 11 × 4
#> name type url parent
#> <chr> <chr> <chr> <chr>
#> 1 Utilities Folder https://carto.nationalmap.go… <NA>
#> 2 contours MapServer https://carto.nationalmap.go… <NA>
#> 3 geonames MapServer https://carto.nationalmap.go… <NA>
#> 4 govunits MapServer https://carto.nationalmap.go… <NA>
#> 5 map_indices MapServer https://carto.nationalmap.go… <NA>
#> 6 selectable_polygons MapServer https://carto.nationalmap.go… <NA>
#> 7 structures MapServer https://carto.nationalmap.go… <NA>
#> 8 transportation MapServer https://carto.nationalmap.go… <NA>
#> 9 Utilities/Geometry GeometryServer https://carto.nationalmap.go… Utili…
#> 10 Utilities/PrintingTools GPServer https://carto.nationalmap.go… Utili…
#> 11 Utilities/RasterUtilities GPServer https://carto.nationalmap.go… Utili…
Created on 2021-11-28 by the reprex package (v2.0.1)
A couple of notes:
-
esriUrl_isValid
only validates MapServer and FeatureServer URLs so the url cannot be passed toesrimeta()
without returning an error and the base url can't be extracted by theesri2sf::esriUrl_parseUrl()
function for the same reason. It may be helpful to allow these functions to accept server or folder URLs. - I'd love to add the option of returning a list of layers for all services as well as the folders and services. I think this should be relatively straightforward with the
esriLayers()
function but I ran into some performance issues when testing it out. Suggestions on how to incorporate this are welcome. - The original
esri2sf
andesri2df
functions used the convention of all lowercase function names. I noticed that more recent additions use a camel case convention that matches the conventions of ArcGIS web services. I stuck with the latter in this draft but wasn't sure if all lowercase may be preferred.
Happy new year, @jacpete. This isn't an urgent issue but I figured you wouldn't mind if I tagged you to check if you had a chance to consider this feature request. Suggestions or feedback are welcome!
Hey @elipousson,
Thanks for the ping, end of the year got pretty crazy and I had forgotten about this issue. I will try to get you a response with my thoughts tonight after I get re-established in where I left off. Happy you're around to discuss stuff with.
The new function looks great. It was something I had written down to add and I am glad someone beat me to it. Tested it out on a couple url's I use regularly and it worked as expected. My only note for the function in the current form is the inclusion of the new base pipe |>
. While I personally don't have a problem with its inclusion as I always tend to run the latest version of R on my machines, we would have to bump the minimum R version for the package to 4.1 which could hurt users where their organization limits the version they have available. I don't see this as a deal breaker, but its something we should discuss before implementing. Other options would be to de-pipe the code or to switch to the {magrittr} pipe and add it as a dependency.
In response to the other comments:
esriUrl_isValid
only validates MapServer and FeatureServer URLs so the url cannot be passed toesrimeta()
without returning an error and the base url can't be extracted by theesri2sf::esriUrl_parseUrl()
function for the same reason. It may be helpful to allow these functions to accept server or folder URLs.
I see what you mean here and I will work on a pull request to fix this.
- I'd love to add the option of returning a list of layers for all services as well as the folders and services. I think this should be relatively straightforward with the
esriLayers()
function but I ran into some performance issues when testing it out. Suggestions on how to incorporate this are welcome.
I would also love to see this functionality combined in. Would you send an example of where it works and better yet where it doesn't and we can try to root out a fix so this functionality can be added in.
- The original
esri2sf
andesri2df
functions used the convention of all lowercase function names. I noticed that more recent additions use a camel case convention that matches the conventions of ArcGIS web services. I stuck with the latter in this draft but wasn't sure if all lowercase may be preferred.
This is definitely been influenced by my personal taste. I tend to mix camelCase with underscores being used for separating parts of a name. For example the esriUrl_
prefixed functions all deal with the same type of data (ESRI url strings) and then the actual function name is base of the full function name (esriUrl_isValid
, esriUrl_parseUrl
). I know this doesn't agree with the original esri2sf
function that you rightly noted was all lowercase, but it does follow the camelCase of the functions in the zzz.R file that were there before I started adding stuff to this repo. Totally up for debate on what we decide and use as standards moving forward, just want to make sure we don't change too much of pieces that already exist as exported functions so we don't break peoples existing code (we can create aliases and use deprecation warnings if we do end up changing an exported function). My current (biased) suggestion is to use camelCase for new functions except for the core original function esri2sf
and its direct counterpart esri2df
. My main reason for avoiding lowercase names is readability but the '2' in these function names solves that issue. @yonghah feel free to chime in on any thoughts you have here as well. I am happy to keep adding functionality and fixing bugs in the package but when it comes to stylistic decisions I want to make sure your'e on board as well.
Here is an updated version of esriIndex
that includes a layers option (and no more pipes). Unfortunately, the layers option definitely keeps kicking back errors when I try to run it on a whole server.
esriIndex <- function(url, parent = NULL, recurse = TRUE, layers = FALSE) {
urlServer <-
stringr::str_extract(url, ".+rest/services")
urlInfo <-
jsonlite::read_json(paste0(urlServer, "?f=json"), simplifyVector = TRUE)
folders <-
tibble::tibble(
name = as.character(urlInfo[["folders"]]),
type = "Folder",
url = paste0(urlServer, "/", urlInfo[["folders"]]),
parent = parent
)
services <-
tibble::tibble(
name = as.character(urlInfo[["services"]][["name"]]),
type = as.character(urlInfo[["services"]][["type"]]),
url = paste0(urlServer, "/", urlInfo[["services"]][["name"]], "/", urlInfo[["services"]][["type"]], recycle0 = TRUE),
parent = parent
)
urlIndex <-
dplyr::bind_rows(
folders,
services
)
if (layers) {
layers <- purrr::map2_dfr(
index$url, index$name,
~ layerIndex(url = .x, parent = .y)
)
urlIndex <-
dplyr::bind_rows(
urlIndex,
layers
)
}
if (recurse) {
if (length(folders[["name"]]) > 0) {
urlIndex <-
dplyr::bind_rows(
urlIndex,
purrr::pmap_dfr(
folders,
~ esriIndex(url = ..3, parent = ..1)
)
)
}
}
urlIndex
}
Here is the layerIndex
function that I added to esriIndex:
layerIndex <- function(url, parent = NULL) {
layerInfo <- jsonlite::read_json(paste0(url, "/layers?f=json"), simplifyVector = TRUE)
if (!is.null(layerInfo) && (class(layerInfo[["layers"]]) == "data.frame")) {
layers <- dplyr::select(
layerInfo[["layers"]],
id, name, type, geometryType
)
dplyr::mutate(
layers,
url = paste0(url,"/", id),
parent = parent
)
} else {
NULL
}
}
I could set up a pull request for the esriIndex function without the layers parameter now or just wait until we have a chance to work out the problems.
I am doing a lot of work to rewrite and generalize the esriUrl_* functions right now that I hope to have done tonight or tomorrow. Wait on your pull request until I do a pull request for those changes. Just want to make sure we aren't messing up each other in the process. Once I get that fixed I will take a deeper look into the esriLayers function and see if I can spot what's going on.
@elipousson Would you be able to give more details about this comment:
- I'd love to add the option of returning a list of layers for all services as well as the folders and services. I think this should be relatively straightforward with the
esriLayers()
function but I ran into some performance issues when testing it out. Suggestions on how to incorporate this are welcome.
What kind of performance issues did you get? Were there unexpected errors in the R code or was the server returning error codes? I was trying to test out what the issue could be but I have no examples of the issue you were having. Just as a note the '/layer' subpage is only available for MapServer and FeatureServer urls. If a feature url for one of these services is provided it will truncate the url to the service type.
#All Valid entries
esriLayers("https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/Demographics/ESRI_Census_USA/MapServer/3")
esriLayers("https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/TaxParcel/AssessorsBasemap/MapServer")
esriLayers("https://carto.nationalmap.gov/arcgis/rest/services/contours/MapServer")
#Not Valid (current return shown below)-but better error handling (explicit check for '/(FeatureServer|MapServer)/?$' needs added)
esriLayers("https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/Elevation/ESRI_Elevation_World/GPServer")
# $error
# $error$code
# [1] 400
#
# $error$message
# [1] "Unable to complete operation."
#
# $error$details
# [1] "GPTask 'layers' does not exist or is inaccessible."
#
#
Hello @jacpete! I expect you've been too busy to return to this project (I know I've had a hectic winter and spring) but wanted to check if you need any help with issue #41 or #40 or #43. I can't recall if all three need to be resolved in order to make it even possible to create an esriIndex
function or if there is some way to make some more modest changes to get this working even if it stays experimental.
Hey @elipousson. Thanks again for the prod. I finally got time to finish #40. And I believe it fixes the urlencoding and the token issues mentioned here. I know this was just the tip of the iceberg for your feature request, but I am happy to start tackling it piece by piece now. I think #39 should focus on the esriIndex function you proposed. Can you attach a latest draft of the function after updating the package to include the changes in #40? I'd love to get this added into the package. I am going to close #41 since I think the latest updates should have addressed some of the issues with esriLayers()
, but if you are still having trouble with speed in your implementation of esriIndex we can discuss it further here or we can reopen it.
That all sounds great! Thanks so much for making time for these updates.
I should have time next week or the following weekend to rewrite the esriIndex()
function. I started doing something similar with the get_esri_layers
function from my {overedge} package which takes a service URL and 1 or more layer integers then returns a named list (with the names based on the layer names from esri2sf::esrimeta()
).
I'd also love to add support for buffer distances (since the FeatureServer API has native support for it) although I think that should be tracked in a separate issue and come in a separate pull request.
I should have waited until next week but instead I stayed up later to update esriIndex
and esriLayerIndex
. I added a "return"
parameter to optionally limit the index to services, folders, layers, or tables.
Updated (2022-05-12): This is a good start but it needs some work on how the URLs are being built and around error handling if the server doesn't have the type of resource included in the "return" parameter. I need to avoid the temptation to keep working on this but I do plan to return in the next week or so.
library(esri2sf)
esriLayerIndex <- function(url, parent = NULL, return = c("layers", "tables")) {
return <- match.arg(return, c("layers", "tables"), several.ok = TRUE)
layerIndex <- NULL
if (grepl(pattern = "(MapServer|FeatureServer)$", url)) {
serviceInfo <- esriLayers(url)
if (!is.null(serviceInfo)) {
if ("layers" %in% return) {
layerIndex <-
dplyr::select(
serviceInfo$layers,
id, name, type, geometryType
)
}
if ("tables" %in% return) {
layerIndex <-
dplyr::bind_rows(
layerIndex,
dplyr::select(
serviceInfo$tables,
id, name, type
)
)
}
layerIndex <-
dplyr::mutate(
layerIndex,
url = paste0(url, "/", id)
)
}
}
return(layerIndex)
}
esriIndex <- function(url, parent = NULL, return = c("services", "folders", "layers", "tables"), recurse = TRUE, layers = FALSE) {
return <- match.arg(return, c("services", "folders", "layers", "tables"), several.ok = TRUE)
if (esriUrl_isValidRoot(url) | esriUrl_isValidFolder(url)) {
rootUrl <- url
} else {
parsed <-
esriUrl_parseUrl(url)
rootUrl <-
paste0(parsed$scheme, paste0(c(parsed$host, parsed$instance, parsed$restIndicator), collapse = "/"))
}
urlInfo <-
esrimeta(rootUrl)
folders <- NULL
if ("folders" %in% return) {
folders <- urlInfo$folders
folders <-
dplyr::bind_cols(
name = as.character(folders),
type = "Folder",
url = paste0(rootUrl, folders),
parent = parent
)
}
services <- NULL
if ("services" %in% return) {
services <- urlInfo$services
if (is.null(parent)) {
services_url <- paste0(rootUrl, services$name, "/", services$type, recycle0 = TRUE)
} else {
services_url <- paste0(rootUrl, "/", services$type, recycle0 = TRUE)
}
services <-
dplyr::bind_cols(
services,
url = services_url,
parent = parent
)
}
urlIndex <-
dplyr::bind_rows(
folders,
services
)
if (any(c("layers", "tables") %in% return) && !is.null(services) && (nrow(services) > 0)) {
ValidServices <- vapply(services$url, esriUrl_isValidService, TRUE)
layerIndex <-
purrr::map2(
services$url[ValidServices], services$name[ValidServices],
~ esriLayerIndex(url = .x, parent = .y, return = return)
)
urlIndex <-
dplyr::bind_rows(
urlIndex,
layerIndex
)
}
if (recurse && !is.null(folders) && (nrow(folders) > 0)) {
urlIndex <-
dplyr::bind_rows(
urlIndex,
purrr::map2_dfr(
folders$url, folders$name,
~ esriIndex(url = .x, parent = .y, return = return)
)
)
}
return(urlIndex)
}
testIndex <- esriIndex(url = "https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/")
dplyr::glimpse(testIndex)
#> Rows: 43
#> Columns: 4
#> $ name <chr> "Demographics", "Elevation", "Locators", "Louisville", "Network…
#> $ type <chr> "Folder", "Folder", "Folder", "Folder", "Folder", "Folder", "Fo…
#> $ url <chr> "https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/De…
#> $ parent <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Demographics", "De…
testIndex <- esriIndex("https://geodata.md.gov/imap/rest/services", return = "folders")
#> Error in value[[3L]](cond): Url is not a valid ESRI Service Url.
#> Error code: 403
#> Message: Access to this resource is not allowed
dplyr::glimpse(testIndex)
#> Rows: 43
#> Columns: 4
#> $ name <chr> "Demographics", "Elevation", "Locators", "Louisville", "Network…
#> $ type <chr> "Folder", "Folder", "Folder", "Folder", "Folder", "Folder", "Fo…
#> $ url <chr> "https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/De…
#> $ parent <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Demographics", "De…
testIndex <- esriIndex("https://geodata.md.gov/imap/rest/services/Transportation", return = c("services", "layers"))
dplyr::glimpse(testIndex)
#> Rows: 33
#> Columns: 3
#> $ name <chr> "Transportation/MD_AlternativeFuel", "Transportation/MD_Alternati…
#> $ type <chr> "FeatureServer", "MapServer", "FeatureServer", "MapServer", "Feat…
#> $ url <chr> "https://geodata.md.gov/imap/rest/services/TransportationTranspor…
Created on 2022-05-12 by the reprex package (v2.0.1)
This is taking things in a bit of a different direction but I fixed the issues with the prior draft of the index function and effectively rebuilt esriLayers and esrimeta using the {httr2} package. I started on this side track because I wanted to try out the "sitemap" parameter of the Catalog API service (which I think is what the esrimeta
function is actually using) and {httr2} supports an XML response. I didn't test it but I think it may also be a little faster than the current versions of these functions and I think it may have options for handling authorization and throttling requests.
Obviously, switching to {httr2} would require a separate issue and likely a dedicated development branch but it may be worth exploring.
# Create an index of folders, services, layers, and tables for an ArcGIS Server
esriIndex <- function(url, parent = NULL, recurse = FALSE, ...) {
esriResp <- esriCatalog(url, ...)
index <- NULL
urlIndex <- url
if (!!length(esriResp[["folders"]])) {
folders <-
dplyr::bind_cols(
"name" = unlist(esriResp$folders),
"index" = "folder",
"type" = NA
)
index <-
dplyr::bind_rows(
index,
folders
)
}
if (!!length(esriResp[["services"]])) {
services <-
dplyr::bind_rows(esriResp$services)
services <-
dplyr::bind_cols(
services,
"index" = "service"
)
index <-
dplyr::bind_rows(
index,
services
)
}
if (is.null(index)) {
return(index)
}
urlbase <-
regmatches(
urlIndex,
regexpr(pattern = ".+(?=/)", text = urlIndex, perl = TRUE)
)
index <-
dplyr::mutate(
index,
url = NULL,
url = dplyr::if_else(
grepl(pattern = "/", x = name),
urlbase,
urlIndex
),
url = dplyr::case_when(
(index == "folder") ~ paste0(url, "/", name),
TRUE ~ paste0(url, "/", name, "/", type)
)
)
if (!is.null(parent)) {
index <-
dplyr::bind_cols(
index,
"parent" = parent
)
}
if (recurse) {
folderIndex <- subset(index, index == "folder")
if (nrow(folderIndex) > 0) {
folderIndex <-
purrr::map2_dfr(
folderIndex$url,
folderIndex$name,
~ esriIndex(url = .x, parent = .y, recurse = TRUE)
)
index <-
dplyr::bind_rows(
index,
folderIndex
)
}
layerIndex <- subset(index, type %in% c("MapServer", "FeatureServer"))
if (nrow(layerIndex) > 0) {
layerIndex <-
purrr::map2_dfr(
layerIndex$url,
layerIndex$name,
~ esriLayers(url = .x, parent = .y)
)
index <-
dplyr::bind_rows(
index,
layerIndex
)
}
}
return(index)
}
# Create an index of layers and tables from an ArcGIS Service
esriLayers <- function(url, parent = NULL, ...) {
esriResp <- esriCatalog(url, ...)
index <- NULL
if (!!length(esriResp[["layers"]])) {
layers <-
dplyr::bind_cols(
dplyr::bind_rows(esriResp$layers),
"index" = "layer"
)
index <-
dplyr::bind_rows(
index,
layers
)
}
if (!!length(esriResp[["tables"]])) {
tables <-
dplyr::bind_cols(
dplyr::bind_rows(esriResp$tables),
"index" = "table"
)
index <-
dplyr::bind_rows(
index,
tables
)
}
if (is.null(index)) {
return(index)
}
index <-
dplyr::bind_cols(
index,
"parent" = parent,
"url" = paste0(url, "/", index$id)
)
return(index)
}
# Get a catalog of folders, services, tables, and layers
esriCatalog <- function(url, format = "json", option = NULL, outSR = NULL, ...) {
format <- match.arg(format, c("json", "html", "kmz", "sitemap", "geositemap"))
req <- httr2::request(url)
req <- httr2::req_url_query(req = req, f = format)
resp <- httr2::req_perform(req = req)
if (format == "json") {
if (!is.null(option) && (option == "footprints")) {
req <- httr2::req_url_query(req = req, option = option)
if (!is.null(outSR)) {
req <- httr2::req_url_query(req = req, outSR = outSR)
}
}
json <- httr2::resp_body_json(resp = resp, check_type = FALSE, ...)
return(json)
} else if (format %in% c("sitemap", "geositemap")) {
sitemap <- httr2::resp_body_xml(resp, ...)
sitemap <- xml2::as_list(sitemap)
sitemap <- dplyr::bind_rows("url" = unlist(sitemap, use.names = FALSE))
return(sitemap)
}
}
Got a chance to start looking through this. I started with esriCatalog()
since it will become a basis for esriLayers
and esriIndex
. All three of these will need a token parameter added so that they work with locked down servers. This one one of the things I added to the package with #40. An example is like:
# Get a catalog of folders, services, tables, and layers
esriCatalog <- function(url, format = "json", token = "", option = NULL, outSR = NULL, ...) {
format <- match.arg(format, c("json", "html", "kmz", "sitemap", "geositemap"))
req <- httr2::request(url)
req <- httr2::req_url_query(.req = req, f = format, token = token)
resp <- httr2::req_perform(req = req)
if (format == "json") {
if (!is.null(option) && (option == "footprints")) {
req <- httr2::req_url_query(req = req, option = option)
if (!is.null(outSR)) {
req <- httr2::req_url_query(req = req, outSR = outSR)
}
}
json <- httr2::resp_body_json(resp = resp, check_type = FALSE, ...)
return(json)
} else if (format %in% c("sitemap", "geositemap")) {
sitemap <- httr2::resp_body_xml(resp, ...)
sitemap <- xml2::as_list(sitemap)
sitemap <- dplyr::bind_rows("url" = unlist(sitemap, use.names = FALSE))
return(sitemap)
}
}
I have access to a secure server at through work that host some MapServers privately behind authorization and can confirm that esriCatalog(url = 'https://<host>/arcgis/rest/services', format = 'sitemap', token = "")
gives a shorter list of URL's than esriCatalog(url = 'https://<host>/arcgis/rest/services', format = 'sitemap', token = generateToken(server = 'https://<host>', uid = '<uid>', pwd = '<pwd>'))
(relying on the generateToken()
function from the currently unmerged #48). This tells me that een though token is not listed as a parameter in Catalog API Service it is still valid and will provide access to the secure services information.
Notes:
- Still need to walk through and test the modified esriLayers and esriIndex fxns.
- I like the idea of another issue/pull request to move the rest of the package to httr2 from httr. I am fine making the modifications to the functions as you have above with httr2 in the pull request for esriIndex/esriCatalog and the separate issue to just modernize the rest of the package.
That all sounds great. Thanks for taking a look at this so quickly. Would it be helpful for me to open a draft pull request for this (and incorporate the token parameter throughout) so we can test the functions as part of the package and incorporate the URL validity check functions where necessary? Or share updated code here but wait until you merge #48 before opening a new branch?
I'd say go ahead and start a pull request and we can troubleshoot and check things together there. I did merge #48 so go ahead and rebase from master before you do your pull request. My thoughts currently:
- Basically every public function in the package should have a token parameter
- I'm wary of changing esriLayers. It was created because it was accessing the https://<host>/<instance>/rest/services/<folderName>/\serviceName>/MapServer/layers endpoint which gives far more information than the catalog implementation does. I'm fine with another function working like you have it but I would name it something different.
Sounds great! Excited to dig into this but may need to chip away a little bit at a time over the next couple weeks. I think the difference between the layers endpoint and the catalog API service could also explain the slight performance difference (if it actually exists and isn't just my imagination).
I'll fix up the documentation and add an example to the README so this will be ready to close whenever #50 can be merged.