FedData icon indicating copy to clipboard operation
FedData copied to clipboard

Functions to Automate Downloading Geospatial Data Available from Several Federated Data Sources

Project Status: Active – The project has reached a stable, usable
state and is being actively
developed. CRAN
version CRAN downloads per
month CRAN
downloads Build
Status Coverage
Status Zenodo
DOI ROpenSci
Status

FedData version 3.0 is about to be released to CRAN. There are several breaking changes in the FedData API from version 2.x. Please see [NEWS.md] for a list of changes.

FedData is an R package implementing functions to automate downloading geospatial data available from several federated data sources.

Currently, the package enables extraction from seven datasets:

This package is designed with the large-scale geographic information system (GIS) use-case in mind: cases where the use of dynamic web-services is impractical due to the scale (spatial and/or temporal) of analysis. It functions primarily as a means of downloading tiled or otherwise spatially-defined datasets; additionally, it can preprocess those datasets by extracting data within an area of interest (AoI), defined spatially. It relies heavily on the sf and raster packages.

This package has been built and tested on a binary install of R on macOS 11.5 (Big Sur), and has been successfully run on Ubuntu via rocker/geospatial and on Windows 10.

Development

Contributors

  • Dylan Beaudette - USDA-NRCS Soil Survey Office, Sonora, CA
  • Jeffrey Hollister - US EPA Atlantic Ecology Division, Narragansett, RI
  • Scott Chamberlain - ROpenSci and Museum of Paleontology at UC Berkeley

Install FedData

  • From CRAN:
install.packages("FedData")
  • Development version from GitHub:
install.packages("devtools")
devtools::install_github("ropensci/FedData")

Demonstration

This demonstration script is available as an R Markdown document in the GitHub repository: https://github.com/ropensci/FedData.

Load FedData and define a study area

# FedData Tester
library(FedData)
library(magrittr)

# FedData comes loaded with the boundary of Mesa Verde National Park, for testing
FedData::meve

Get and plot the National Elevation Dataset for the study area

# Get the NED (USA ONLY)
# Returns a raster
NED <- get_ned(
  template = FedData::meve,
  label = "meve"
)
# Plot with raster::plot
raster::plot(NED)

Get and plot the Daymet dataset for the study area

# Get the DAYMET (North America only)
# Returns a raster
DAYMET <- get_daymet(
  template = FedData::meve,
  label = "meve",
  elements = c("prcp", "tmax"),
  years = 1980:1985
)
# Plot with raster::plot
raster::plot(DAYMET$tmax$X1985.10.23)

Get and plot the daily GHCN precipitation data for the study area

# Get the daily GHCN data (GLOBAL)
# Returns a list: the first element is the spatial locations of stations,
# and the second is a list of the stations and their daily data
GHCN.prcp <- get_ghcn_daily(
  template = FedData::meve,
  label = "meve",
  elements = c("prcp")
)
#> Warning in if (!is.null(template) & !(class(template) %in%
#> c("SpatialPolygonsDataFrame", : the condition has length > 1 and only the first
#> element will be used
#> Warning: `select_()` was deprecated in dplyr 0.7.0.
#> Please use `select()` instead.
# Plot the NED again
raster::plot(NED)
# Plot the spatial locations
sp::plot(GHCN.prcp$spatial,
  pch = 1,
  add = TRUE
)
legend("bottomleft",
  pch = 1,
  legend = "GHCN Precipitation Records"
)

Get and plot the daily GHCN temperature data for the study area

# Elements for which you require the same data
# (i.e., minimum and maximum temperature for the same days)
# can be standardized using standardize==T
GHCN.temp <- get_ghcn_daily(
  template = FedData::meve,
  label = "meve",
  elements = c("tmin", "tmax"),
  years = 1980:1985,
  standardize = TRUE
)
# Plot the NED again
raster::plot(NED)
# Plot the spatial locations
sp::plot(GHCN.temp$spatial,
  add = TRUE,
  pch = 1
)
legend("bottomleft",
  pch = 1,
  legend = "GHCN Temperature Records"
)

Get and plot the National Hydrography Dataset for the study area

# Get the NHD (USA ONLY)
get_nhd(
  template = FedData::meve,
  label = "meve"
) %>%
  plot_nhd(template = FedData::meve)

Get and plot the NRCS SSURGO data for the study area

# Get the NRCS SSURGO data (USA ONLY)
SSURGO.MEVE <- get_ssurgo(
  template = FedData::meve,
  label = "meve"
)
# Plot the NED again
raster::plot(NED)
# Plot the SSURGO mapunit polygons
plot(SSURGO.MEVE$spatial$geom,
  lwd = 0.1,
  add = TRUE
)

Get and plot the NRCS SSURGO data for particular soil survey areas

# Or, download by Soil Survey Area names
SSURGO.areas <- get_ssurgo(
  template = c("CO670", "CO075"),
  label = "CO_TEST"
)
#> Warning: One or more parsing issues, see `problems()` for details

#> Warning: One or more parsing issues, see `problems()` for details

#> Warning: One or more parsing issues, see `problems()` for details

#> Warning: One or more parsing issues, see `problems()` for details

#> Warning: One or more parsing issues, see `problems()` for details

#> Warning: One or more parsing issues, see `problems()` for details

#> Warning: One or more parsing issues, see `problems()` for details

# Let's just look at spatial data for CO675
SSURGO.areas.CO675 <-
  SSURGO.areas$spatial %>%
  dplyr::filter(AREASYMBOL == "CO075")

# And get the NED data under them for pretty plotting
NED.CO675 <- get_ned(
  template = SSURGO.areas.CO675,
  label = "SSURGO_CO675"
)

# Plot the SSURGO mapunit polygons, but only for CO675
raster::plot(NED.CO675)
plot(SSURGO.areas.CO675$geom,
  lwd = 0.1,
  add = TRUE
)

Get and plot the ITRDB chronology locations in the study area

# Get the ITRDB records
# Buffer MEVE, because there aren't any chronologies in the Park
ITRDB <- get_itrdb(
  template = FedData::meve %>%
    sf::st_buffer(50000),
  label = "meve",
  measurement.type = "Ring Width",
  chronology.type = "Standard"
)
#> Warning in eval(jsub, SDenv, parent.frame()): NAs introduced by coercion
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries

# Plot the MEVE buffer
plot(
  FedData::meve %>%
    sf::st_buffer(50000) %>%
    sf::st_transform(4326)
)
# Map the locations of the tree ring chronologies
plot(ITRDB$metadata$geometry,
  pch = 1,
  add = TRUE
)
legend("bottomleft",
  pch = 1,
  legend = "ITRDB chronologies"
)

Get and plot the National Land Cover Dataset for the study area

# Get the NLCD (USA ONLY)
# Returns a raster
NLCD <- get_nlcd(
  template = FedData::meve,
  year = 2011,
  label = "meve"
)

# Plot with raster::plot
raster::plot(NLCD)

Get and plot the NASS Cropland Data Layer for the study area

# Get the NASS (USA ONLY)
# Returns a raster
NASS_CDL <- get_nass_cdl(
  template = FedData::meve,
  year = 2016,
  label = "meve"
)
# Plot with raster::plot
raster::plot(NASS_CDL)

# Get the NASS CDL classification table
raster::levels(NASS_CDL)[[1]]

# Also, a convenience function loading the NASS CDL categories and hex colors
cdl_colors()

Acknowledgements

This package is a product of SKOPE (Synthesizing Knowledge of Past Environments) and the Village Ecodynamics Project through grants awarded to the Crow Canyon Archaeological Center and Washington State University by the National Science Foundation. This software is licensed under the MIT license. Continuing development is supported by the Montana Climate Office.

FedData was reviewed for rOpenSci by @jooolia, and was greatly improved as a result. rOpenSci on-boarding was coordinated by @sckott.