tidyterra icon indicating copy to clipboard operation
tidyterra copied to clipboard

Feedback needed

Open dieghernan opened this issue 2 years ago β€’ 12 comments

{tidyterra} tries to improve the data wrangling and visualisation of Spat* objects ({terra}) by providing new tidyverse methods for that objects.

The result is a package with wrappers around terra functions but using tidyverse API.

Hopefully this would help useRs with no experience on spatial rasters to start working with this kind of formats. For experienced useRs tidyterra may not suit their needs, specially in terms of performance with big raster files.

The developer of the package (myself) is one of that not-so-into-raster users, so I may have taken bad decisions on the development of this package. For that reason, all the feedback you can provide is speciall useful.

Some users I can think of are obviously @rhijmans, @nowosad, @dominicroye, @paleolimbot, @milos-agathon @barryrowlingson.

So far, feedback is needed on

  • [ ] Should {tidyterra} load packages at init, as {tidyverse} does? Currently this is the implemented behaviour
  • [ ] drop_na() for SpatRaster, actually acts like terra::mask() %>% terra::trim(). Does this have sense? See https://dieghernan.github.io/tidyterra/reference/drop_na.html#spatraster

dieghernan avatar May 07 '22 13:05 dieghernan

On the tidyverse / ggplot2 side it would be great to have some insight from @hadley @romainfrancois @lionel- @wch @thomasp85 to cite just a few

dieghernan avatar May 07 '22 14:05 dieghernan

Hi @dieghernan -- I check the package and it looks interesting. Currently, I do not have any suggestions, except maybe:

  1. Do you plan to add group_by() and summarize() as well? This could work, for example, like zonal statistics.

Nowosad avatar May 14 '22 18:05 Nowosad

I do not know tidyverse, but I suppose this package could be especially useful for SpatVector as these can be represented as a data.frame (tibble) that includes their geometry (as.data.frame(v, geom="WKT") or by coercing to sf and back).

I also suppose that anything to facilitate the use of ggplot would be very helpful to many

There may be some methods that are useful for SpatRaster as well, such as for selecting layers, but I am not convinced that implementing methods like drop_na are very useful. Adding (near) synonyms might confuse as much as help, and as it would cover a very small part of the interface, one would still need to learn the terra idiom anyway. Otherwise, your idea for how to implement it seems reasonable, and, again, I do not know tidyverse, so my opinion on utility is not that meaningful.

rhijmans avatar May 14 '22 20:05 rhijmans

Thanks for your feedback @Nowosad @rhijmans , much appreciate.

Regarding your suggestion, let me expand a bit. My idea of the package is basically to extend some common used tidyverse methods for data wrangling. The main goal for me is to help useRs with no spatial background to get started with rasters, at least with basic transformations and plotting.

I am leveraging on the idea already implemented on sf/stars for doing so.

sf: https://github.com/r-spatial/sf/blob/HEAD/R/tidyverse.R stars: https://github.com/r-spatial/stars/blob/main/R/tidyverse.R

So I would prefer not to implement spatial operations on the package (for example, left_join.sf would work as appending a data frame to an sf, but for spatial join a specific st_join call is needed). It may be an exception to this with group_by.sf, since it would merge geometries, but this is only implemented on vectors (sf), not in rasters (stars).

So I was not planning on implementing group_by.SpatRaster/summarize.SpatRaster(). This is also connected with @rhijmans comment:

Adding (near) synonyms might confuse as much as help, and as it would cover a very small part of the interface, one would still need to learn the terra idiom anyway.

I would completely encourage useRs to learn {terra} idiom, as it is much more efficient than the wrappers that I could provide here. The conversion (actually implemented on {tidyterra}) data.frame > operation > back to SpatRaster is not efficient for large SpatRasters. I made some effort on that trying to add a section on the docs with the {terra} equivalent on the functions (example: https://dieghernan.github.io/tidyterra/reference/pull.html#terra-equivalent). Also I refer to this on the README: https://github.com/dieghernan/tidyterra#exclamation-a-note-on-performance

Again, {tidyterra} might be useful for beginners and/or medium size SpatRasters, but hopefully this would help to spread the usage of {terra} since it would reduce the barriers to entry. IMHO the {terra} package has an easy interface if you have worked previously with {raster}, but thinking on a completely novice with no background on rasters it can become a bit hard.

Also @rhijmans, I was thinking on removing drop_na() at least for SpatRasters, the implementation may not be really useful. So agree on that, thanks

For conversions SpatVector/sf I would simply advice sf::st_as_sf(SpatVector)/terra::vect(sf.object), this is more straightforward and CRS information won't be lost on the conversion (that I think it would happen with as.data.frame(v, geom="WKT")

Let me share with you the tidyverse methods I have identified on sf/stars and the degree of implementation on tidyterra. I am putting the focus on SpatRasters so far (SpatVectors are based on sf methods, not hard to implement really but I still didn't complete it):

βœ”οΈ: Implemented 🟒: Not explicitely implemented but working on sf 🟑: To be implemented on tidyterra

package verb stars SpatRaster sf SpatVector
tibble as_tibble βœ”οΈ βœ”οΈ 🟒 βœ”οΈ
dplyr anti_join βœ”οΈ
dplyr arrange βœ”οΈ
dplyr distinct βœ”οΈ
dplyr filter βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
dplyr full_join βœ”οΈ
dplyr group_by βœ”οΈ
dplyr group_split βœ”οΈ
dplyr inner_join βœ”οΈ
dplyr left_join βœ”οΈ
dplyr mutate βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
dplyr pull βœ”οΈ βœ”οΈ 🟒 βœ”οΈ
dplyr relocate βœ”οΈ 🟒 βœ”οΈ
dplyr rename βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
dplyr right_join βœ”οΈ
dplyr rowwise βœ”οΈ
dplyr sample_frac βœ”οΈ
dplyr sample_n βœ”οΈ
dplyr select βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
dplyr semi_join βœ”οΈ
dplyr slice βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
dplyr summarise βœ”οΈ
dplyr transmute βœ”οΈ βœ”οΈ βœ”οΈ βœ”οΈ
dplyr ungroup βœ”οΈ
tidyr drop_na βœ”οΈ 🟒 βœ”οΈ
tidyr gather βœ”οΈ
tidyr nest βœ”οΈ
tidyr pivot_longer βœ”οΈ
tidyr pivot_wider βœ”οΈ
tidyr replace_na βœ”οΈ βœ”οΈ 🟒 βœ”οΈ
tidyr separate βœ”οΈ
tidyr separate_rows βœ”οΈ
tidyr spread βœ”οΈ
tidyr unite βœ”οΈ
tidyr unnest βœ”οΈ

dieghernan avatar May 17 '22 07:05 dieghernan

@Nowosad @rhijmans

  1. Do you plan to add group_by() and summarize() as well? This could work, for example, like zonal statistics.

and

I do not know tidyverse, but I suppose this package could be especially useful for SpatVector as these can be represented as a data.frame (tibble) that includes their geometry (as.data.frame(v, geom="WKT") or by coercing to sf and back).

Next release of tidyterra would support group_by/summarize for SpatVectors based on the as.data.frame(v, geom="WKT"). Also, more dplyr methods for SpatVectors (arrange, distinct, bind_row/col, left_join/inner_join) would be added (see #84). Instead on relying on sf conversion I created my own process based on the as.data.frame approach.

A quick example:

library(terra)
#> terra 1.7.18
library(tidyterra)
#> 
#> Attaching package: 'tidyterra'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:terra':
#> 
#>     intersect, union
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

v_lux <- vect(system.file("ex/lux.shp", package = "terra"))

v_lux %>%
  mutate(gr = cut(POP / 1000, 5)) %>%
  group_by(gr) %>%
  summarise(n = n(), tot_pop = sum(POP), mean_area = mean(AREA)) %>%
  arrange(desc(gr)) %>%
  glimpse() %>%
  autoplot(aes(fill = gr)) +
  ggplot2::ggtitle("Dissolved")
#> Rows: 3
#> Columns: 4
#> $ gr        <fct> "(147,183]", "(40.7,76.1]", "(4.99,40.7]"
#> $ n         <int> 2, 1, 9
#> $ tot_pop   <int> 359427, 48187, 194391
#> $ mean_area <dbl> 244.0000, 185.0000, 209.7778


# We can control the aggregation on summarise with .dissolve
v_lux %>%
  mutate(gr = cut(POP / 1000, 5)) %>%
  group_by(gr) %>%
  # Here, not dissolving
  summarise(
    n = n(), tot_pop = sum(POP), mean_area = mean(AREA),
    .dissolve = FALSE
  ) %>%
  arrange(desc(gr)) %>%
  # Same statistics
  glimpse() %>%
  # But not dissolving aggregated polygons
  autoplot(aes(fill = gr)) +
  ggplot2::ggtitle("Not Dissolved")
#> Rows: 3
#> Columns: 4
#> $ gr        <fct> "(147,183]", "(40.7,76.1]", "(4.99,40.7]"
#> $ n         <int> 2, 1, 9
#> $ tot_pop   <int> 359427, 48187, 194391
#> $ mean_area <dbl> 244.0000, 185.0000, 209.7778

Created on 2023-03-11 with reprex v2.0.2

dieghernan avatar Mar 11 '23 08:03 dieghernan

That is very cool.

I wonder if you there are cases where you can avoid coercing the geometries to WKT or sf. That could save a lot of time. For example, you currently have.

select.SpatVector <- function(.data, ...) {
  # Use sf method
  sf_obj <- sf::st_as_sf(.data)
  selected <- dplyr::select(sf_obj, ...)
  return(terra::vect(selected))
}

But that can be done much more efficiently with

select.SpatVector <- function(.data, ...) {
	d <- data.frame(rbind(1:ncol(.data)))
	names(d) <- names(.data)
	selected <- dplyr::select(d, ...)
	columns <- unlist(selected[1,])
	.data[,columns]
}

Likewise,

rename.SpatVector <- function(.data, ...) {
  # Use sf
  sfobj <- dplyr::rename(sf::st_as_sf(.data), ...)
  end <- terra::vect(sfobj)
  return(end)
}

Could be

rename.SpatVector <- function(.data, ...) {
  # Use data.frame
	d <- data.frame(matrix(ncol=ncol(.data), nrow=0))
	names(d) <- names(.data)
	d <- dplyr::rename(d, ...)
	names(.data) <- names(d)
	.data
}

I see that you already have something similar for rename_with.

I could have a look at row-wise operations as well if you are interested.

rhijmans avatar Mar 11 '23 19:03 rhijmans

Thanks @rhijmans I still have to migrate some functions, including those that you mentioned. My overall approach is to avoid as much as possible coercion between classes, so your code is exactly what I needed.

Row-wise? I didn't explore it so far, don't need to spend time on that yet, but obviously if you feel in the mood and finally you have a look please let me know.

dieghernan avatar Mar 11 '23 19:03 dieghernan

Rowwise implemented in #92 😁

dieghernan avatar Mar 16 '23 15:03 dieghernan

Hi, would being able to ggsave() GeoTIFFs be a feature that's in scope for tidyterra? It's currently possible by saving whatever spatial plot you have as a normal image, reading that file back into R as a raster, setting extents and CRS on the raster, and then writing to disk again to put the raster down as a GeoTIFF. However, my experience is this is a fragile process as it's easy for axis labels, legends, ggsave() arguments, coord_sf(), and other things to cause extents to get confused. The result's easily that you can write 160+ MB to disk several times before you get a .tif which actually has the extents that were set on it.

Having a GeoTIFF device which automates this process seems handy for tasks centering around annotated map production. There's several questions around this on StackOverflow from folks needing to put grids or CRS ticks and such on rasters. I'm using this approach as a way of logging what spatial processing coded in R is doing in a way that's easily inspected in close detail in GIS (for example, an algorithm's output is good 98% of the time but you need to be able to scan though 40 ha at 0.5 m resolution to find the 2% cases where the code wants improvement).

twest820 avatar Jun 11 '23 00:06 twest820

Will there be a way to utilize sbar() and north() from terra within the tidyterra language?

stantis avatar Aug 30 '23 23:08 stantis

Will there be a way to utilize sbar() and north() from terra within the tidyterra language?

Hi @stantis , that’s not possible AFAIK since sbar() and north() are meant to be used on base plots, while ggplot2 uses another plotting mechanism.

If you want to plot north arrows and geographic scale bars in ggplot2 you may want to use ggspatial funs (https://paleolimbot.github.io/ggspatial/reference/annotation_north_arrow.html https://paleolimbot.github.io/ggspatial/reference/annotation_scale.html) or switch completely to tmap, that has a great support for these two graphical objects (https://r-tmap.github.io/tmap/reference/tm_compass.html https://r-tmap.github.io/tmap/reference/tm_scale_bar.html).

dieghernan avatar Aug 31 '23 06:08 dieghernan