Curriculum Advisory Committee Recommendations from 2022 Q1 Meeting (Help wanted!)
Hello maintainers and fellow Carpentries community members!
The Data Carpentry Geospatial Curriculum Advisory Committee had our 1st quarter meeting on March 29th and we have posted our minutes on the Curriculum Advisors repo.
You can view the minutes here, but I've copied the relevant recommendations below. We are calling for volunteers to help implement these important changes to the lessons. I will label this issue as "Help Wanted" and look forward to your contributions!
Please feel free to reach out to me or my co-chair Jeff Hollister (@jhollist) if you have questions about these recommendations or if you would like to bring something to our attention for future meetings of the CAC. We also encourage you to reach out to the maintainers of the lessons as you develop.
- Transition from PROJ and proj4strings
- Lessons should be updated to reference coordinate systems by their EPSG codes from the EPSG Geodetic Parameter Dataset instead of using proj4strings. The Committee agreed that the alternative Well-known text (WKT) representation is unwieldy and unnecessary for most common use cases. WKT should be mentioned as an alternative to EPSG codes, especially where there is no existing EPSG standard. Lessons should include examples of converting between the EPSG and WKT representations.
- Deprecation of the sp, rgeos, and rgdal packages
- The Committee agreed that the references to rgdal and rgeos be removed or replaced with references to the equivalent sf and terra functions as appropriate. This decision is closely paired with the rationale for choosing terra as the replacement for the raster package and aims to avoid code-breaking deprecations coming some time in 2023.
- Transition from raster to terra or stars
- The terra package appears to be the most direct replacement for raster as it uses language which is similar to raster and common to other GIS. The Committee recommends that terra be adopted as a replacement to raster. Stars should be presented as an alternative to terra that may be faster in some cases or more appropriate for analyses with longitudinal elements.
Is anyone already working on the the raster to terra conversions? I can help with that (have been doing the conversions it for my own classes as well, and I am very familiar with the NEON data). And I support using terra here, not stars. The latter is good for more advanced users, but as an intro course, terra is appropriate.
Happy to help! I’ve been working with terra a lot recently (and tidyterra - which would be worth weaving in as it means we can take away the painful step of having to convert to a data frame for plotting, etc. so one can use geom_spatraster() and geom_spatraster_rgb()) and really loving it. So fast! And just as easy (or easier with tidyterra https://dieghernan.github.io/tidyterra/ )
It's so exciting to watch all these changes roll in!
Good evening,
The changes made so far addressed @srappel's issues 2 & 3. Regarding issue 1, Transition from PROJ and proj4strings , I took a look at the lesson's data and found the following:
- The raster files are compatible with EPSG codes. I tested by projecting the raster using their own EPSG codes and them comparing the text description of their CRSs; I found no difference (see code below).
- Most vector files' CRSs contain a EPSG code (the only exception is
but it easy to infer it). However, after applying the same procedure used for the raster data, I found differences in their CRSs' WKT before and after applying a projection. This implies the their CRS were build either using a PROJ string or an old EPSG database. - Some vector files have Z coordinates but they are all set to 0. This could make the
package produce warnings or even throw errors.
I guess we could replace the vector data with re-projected versions and without the Z coordinates. In this way, we could start looking at issue 1 knowing the lesson's data isn't a source of PROJ strings. However, the lesson data isn't currently under version control. However, this raises a new question; could we also host the lesson's data along the lesson? Or even better, could we build an R package with the data? It wouldn't need to be on CRAN, it could be hosted in some public git repository and be installed using devtools::install_github
#!/usr/bin/Rscript --vanilla
# Check the data for the Carpentries lesson r-raster-vector-geospatial
#---- Setup ----
data_dir <- "~/Documents/github/datacarpentry/r-raster-vector-geospatial/episodes/data"
stopifnot("Data directory not found!" = dir.exists(data_dir))
#--- Utilitary ----
get_crs_wkt <- function(crs) {
get_epsg <- function(crs) {
get_input <- function(crs) {
has_z <- function(obj_sf) {
has_m <- function(obj_sf) {
#---- Raster data ----
raster_tb <-
data_dir %>%
list.files(pattern = "*.tif$",
full.names = TRUE,
recursive = TRUE) %>%
tibble::as_tibble() %>%
dplyr::rename(file_path = value) %>%
dplyr::mutate(obj = purrr::map(file_path, terra::rast),
obj_crs = purrr::map_chr(obj, terra::crs),
obj_crs1 = purrr::map(obj_crs, sf::st_crs),
epsg = purrr::map_int(obj_crs1, get_epsg),
epsg = purrr::map2_chr("EPSG:", epsg, paste0),
new_obj = purrr::map2(obj, epsg, terra::project),
new_crs = purrr::map_chr(new_obj, terra::crs),
crs_diff = purrr::map2_dbl(obj_crs, new_crs, utils::adist))
print("NOTE: Re-projecting rasters using EPSG codes doesn't change their CRSs \
at all.")
raster_tb %>%
dplyr::select(obj_crs, new_crs, crs_diff) %>%
print(n = Inf)
#---- Vector data ----
vector_tb <-
data_dir %>%
list.files(pattern = "*.shp$",
full.names = TRUE,
recursive = TRUE) %>%
tibble::as_tibble() %>%
dplyr::rename(file_path = value) %>%
dplyr::mutate(obj = purrr::map(file_path, sf::read_sf),
obj_crs = purrr::map(obj, sf::st_crs),
crs_wkt = purrr::map(obj_crs, get_crs_wkt),
epsg = purrr::map_int(obj_crs, get_epsg),
has_z = purrr::map_lgl(obj, has_z),
has_m = purrr::map_lgl(obj, has_m),
obj_no_z = purrr::map(obj, sf::st_zm),
crs_input = purrr::map_chr(obj_crs, get_input))
print("NOTE: There is a vector missing EPSG code.")
vector_tb %>%
dplyr::filter(is.na(epsg)) %>%
dplyr::select(file_path, epsg) %>%
dplyr::mutate(file_path = basename(file_path)) %>%
print(n = Inf)
print("NOTE: There some vectors with Z coordinates, but all of them are 0s")
vector_tb %>%
dplyr::filter(has_z) %>%
dplyr::mutate(file_path = basename(file_path),
z_range = purrr::map(obj, sf::st_z_range)) %>%
dplyr::select(file_path, has_z, z_range) %>%
print(n = Inf) %>%
# Add missing EPSG by hand.
vector_tb <-
vector_tb %>%
dplyr::mutate(epsg = dplyr::if_else((crs_input == "WGS 84" & is.na(epsg)),
4326, epsg)) %>%
# Re-project using EPSGs.
dplyr::mutate(new_obj = purrr::map2(obj_no_z, epsg, sf::st_transform),
new_crs = purrr::map(new_obj, sf::st_crs),
new_crs_wkt = purrr::map(new_crs, get_crs_wkt),
crs_diff = purrr::map2_dbl(crs_wkt, new_crs_wkt,
print("NOTE: The CRSs' WKT change after projection using EPSG codes.")
vector_tb %>%
dplyr::select(file_path, crs_diff)
Thanks so much for the fantastic work here, @albhasan.
Regarding the versioning of the example data. The example dataset is published on FigShare, where there is the option of creating a new version of the record if and when the file change. I think the record is owned by NEON, but I would be happy to try to coordinate with them to publish a new version.
Finally, a suggestion: as you have addressed most of the points raised by the CAC, it might be best to close this issue and open a new one where the specific question of how to update the dataset can be discussed further. I'll be happy to re-post my comment there if you do.
Hi @tobyhodges,
I'm following your suggestion and I opened #426. Can you please re-post your comment there?