tmap
tmap copied to clipboard
complex **stars**: how to approach them
Haven't much experience with terra
, but so far, SpatRaster
objects seem pretty straight forward. They always have three dimensions: nrow, ncol, and, nlyr.
stars
objects are much more complex:
- they support non-regular rasters (does
terra
also support them?), - they can have multiple attributes and multiple dimensions (other than x and y).
My question: how should we approach data variables of stars
objects in tmap?
In case there are multiple attributes, these attributes are considered variable names (as was the case in the current version of tmap). Example:
library(tmap)
land
#> stars object with 2 dimensions and 4 attributes
#> attribute(s):
#> cover cover_cls
#> Water bodies :393060 Water :393060
#> Snow / Ice : 61986 Snow/ice : 61986
#> Herbaceous : 21377 Forest : 48851
#> Tree Open : 16171 Other natural vegetation : 32611
#> Sparse vegetation: 12247 Bare area/Sparse vegetation: 26904
#> Cropland : 11658 Cropland : 17843
#> (Other) : 66701 (Other) : 1945
#> trees elevation
#> Min. : 0.0 Min. :-412
#> 1st Qu.: 0.0 1st Qu.: 218
#> Median : 0.0 Median : 608
#> Mean : 15.6 Mean :1140
#> 3rd Qu.: 19.0 3rd Qu.:1941
#> Max. :100.0 Max. :6410
#> NA's :393060 NA's :389580
#> dimension(s):
#> from to offset.xmin delta.xmax refsys point values
#> x 1 1080 -180 0.333333 +proj=longlat +ellps=WGS8... NULL NULL
#> y 1 540 90 -0.333333 +proj=longlat +ellps=WGS8... NULL NULL
#> x/y
#> x [x]
#> y [y]
tm_shape(land) + tm_raster(c("cover_cls", "elevation"))
#> Variable(s) "col" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Created on 2021-09-11 by the reprex package (v2.0.0)
Now, let's compare stars
to terra
with the landsat
dataset:
library(tmap)
library(terra)
library(stars)
library(spDataLarge)
(landsat_stars = read_stars(system.file("raster/landsat.tif", package = "spDataLarge")))
#> stars object with 3 dimensions and 1 attribute
#> attribute(s), summary of first 1e+05 cells:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> landsat.tif 7701 8412 8739 8941.313 9258 14072
#> dimension(s):
#> from to offset delta refsys point values x/y
#> x 1 1128 301905 30 WGS 84 / UTM zone 12N FALSE NULL [x]
#> y 1 1428 4154085 -30 WGS 84 / UTM zone 12N FALSE NULL [y]
#> band 1 4 NA NA NA NA NULL
(landsat_terra = rast(system.file("raster/landsat.tif", package = "spDataLarge")))
#> class : SpatRaster
#> dimensions : 1428, 1128, 4 (nrow, ncol, nlyr)
#> resolution : 30, 30 (x, y)
#> extent : 301905, 335745, 4111245, 4154085 (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=utm +zone=12 +datum=WGS84 +units=m +no_defs
#> source : landsat.tif
#> names : lan_1, lan_2, lan_3, lan_4
#> min values : 7550, 6404, 5678, 5252
#> max values : 19071, 22051, 25780, 31961
The difference is that the bands in terra
are translated to layers, whereas in stars
they are translated to a new dimension (called band
). The layers of terra
are more similar to the attributes in stars
than to the dimensions. names(landsat_stars)
gives "landsat.tif"
, whereas names(landsat_terra)
gives "lan_1"
to "lan_4"
. This makes perfectly sense. I can use split
to split the dimension values into different attributes.
Selecting these 4 variables is straight-forward (as is with multiple attributes in a stars object):
tm_shape(landsat_terra) +
tm_raster(c("lan_1", "lan_2", "lan_3", "lan_4"), col.free = FALSE)
tm_shape(landsat_terra) +
tm_rgb(MV("lan_4", "lan_3", "lan_2"), col.scale = tm_scale_rgb(maxValue = 31961))
Created on 2021-09-11 by the reprex package (v2.0.0)
The question however is how to call the dimension values of a stars object. Two obstacles: 1) a dimension may have numeric values, and 2) there can be more than one non-x-y-dimension.
In the tmap3 it is possible to use integers. See https://r-tmap.github.io/tmap-book/layers.html#raster-layer (Figure 5.8). This would only work if there is only one attribute, and only one non-x-y dimension.
tm_shape(landsat_stars +
tm_raster(1:4, col.free = FALSE)
# not working yet in tmap4
Currently, each dimension in a stars
object is interpreted as a variable, in this example band
which I can use to group by:
tm_shape(landsat_stars) +
tm_raster("landsat.tif") +
tm_facets("band")
#> stars object downsampled to 888 by 1125 cells.
Suppose there are multiple attributes and multiple non-x-y dimensions. The attributes can be seen as one dimension. So it could be possible to have something like this:
tm_shape(my_stars_object) +
tm_rgb(MV(3, 2, 1)) +
tm_facets_wrap(by = "ATTRIBUTES")
where MV
stands for multivariate, the numbers refer to the values of the first non-x-y band, and "ATTRIBUTES"
to the 'attributes' dimension. This example would make sense if my_stars_objects
contains satellite images taken at several times.
Let me know what you think. Ideas and suggestions are welcome! @Nowosad @edzer @rsbivand @rhijmans
In case of a four-dimensional stars
object, it would make sense to associate dimension 3 with facet rows, and dimension 4 with faced columns, or vice versa. I think that is the maximum logic you can get (or dimension 5 aligns with "page", but who wants paged figures?) with two-dimensional plots & facets.
stars
attributes are there for allowing different data types, e.g. the first attribute can be a numeric, the second a logical or categorical variable, while all share the same dimensionality; not sure if terra layers allow different types.
Just a few comments from me:
- I do not think that terra supports non-regular rasters.
- Yes, terra accepts different types (e.g., numerical and categorical) in the same object - see a code example below.
- @mtennekes I am still thinking about
MV
... Two capital letter function name looks strange compared to the rest of tmap syntax. How about justtm_mv()
? - I agree that a special name, e.g.,
"ATTRIBUTES"
would be useful.
library(terra)
#> terra version 1.4.2
elev = rast(nrows = 6, ncols = 6, resolution = 0.5,
xmin = -1.5, xmax = 1.5, ymin = -1.5, ymax = 1.5,
vals = 1:36)
grain_order = c("clay", "silt", "sand")
grain_char = sample(grain_order, 36, replace = TRUE)
grain_fact = factor(grain_char, levels = grain_order)
grain = rast(nrows = 6, ncols = 6, resolution = 0.5,
xmin = -1.5, xmax = 1.5, ymin = -1.5, ymax = 1.5,
vals = grain_fact)
mix_rast = c(elev, grain)
plot(mix_rast)
Created on 2021-09-12 by the reprex package (v2.0.1)
In case of a four-dimensional
stars
object, it would make sense to associate dimension 3 with facet rows, and dimension 4 with faced columns, or vice versa. I think that is the maximum logic you can get (or dimension 5 aligns with "page", but who wants paged figures?) with two-dimensional plots & facets.
stars
attributes are there for allowing different data types, e.g. the first attribute can be a numeric, the second a logical or categorical variable, while all share the same dimensionality; not sure if terra layers allow different types.
Indeed. The theoretical limit of a stars object that can be visualised in tmap is even 6 dimensions (see also https://github.com/r-tmap/tmap/issues/497#issuecomment-690421691). The 6th dimension is used as multivariate, e.g. rgb bands:
tm_shape(stars_obj) +
tm_rgb(tm_mv(1, 2, 3) +
tm_facets_wrap(row = "month", col = "time", page = "year")
The question is how should tmap detect to which dimension values 1, 2, and 3 belong. Probably, I'll go for the first non-x-y dimension that is not used by tm_facets
.
Thx @Nowosad . Yes, tm_mv
is better.
My approach for processing terra
objects and stars
objects with raster is the following:
- Downsample (or aggregate) to a resolution that is sufficient for visualization (determined by the option
max.raster
). - Store the data in a
data.table
, and save the raster specifications (crs, bbx, ncol/nrow).
Which resolution reduction method shall we use? Regular sampling or aggregation? In the current dev version:
- I use
st_downsample
fromstars
. Friendly request for @edzer : could you please export it? - I use
aggregate
fromterra
which is very convenient and quick.
This works well, but is not consistent with each other. Therefore:
- Is there an aggregation function in
stars
that works in the same way as terra's aggregate, so with an argumentfact
? (which can be a vector of aggregation factors per dimension, liken
fromst_downsample
). - Is there a function in
terra
that is similar tostars:::st_downsample
?
Ideally, the user should be able to specify whether to aggregate or to sample.
Side note: for the aggregation method, the user should be aware (or ideally be able to change) the aggregation function. By default it is mean
, which makes sense for most applications (e.g. satellite images). However, for other cases, the sum might be better (e.g. the number of people per raster cell).
For the sample method, the same holds: e.g. if the data value per raster cell is an absolute value (e.g. population count), then sampling is not a good method (unless population density is used).
I use aggregate from terra which is very convenient and quick.
Have you tried on a 10000 x 10000 image, or larger, and compared to down sampling when the image contains overviews?
By default it is mean, which makes sense for most applications (e.g. satellite images)
Not sure I agree: the variance changes, what you show are no longer pixel values; I think there is a reason why the default for GDAL's resampling methods is near
(nearest neighbor), besides speed. Ask a remote sensing specialist?
Exporting st_downsample
- yes, that makes sense.
Exporting st_downsample - yes, that makes sense.
still need to think this through; the more common pattern is to use st_as_stars()
on a stars_proxy
object and use the downscale
argument.
Agree that near
is indeed the saver and faster choice.
Exporting st_downsample - yes, that makes sense.
still need to think this through; the more common pattern is to use
st_as_stars()
on astars_proxy
object and use thedownscale
argument.
This seems a detour when you already have a fairly large (but not huge) stars object (such as landsat_stars
from above) in memory. How can I downsample it without having a function such st_downsample
?
Just to confirm that a SpatRaster always has three dimensions (nrow, ncol, nlyr). There is also the SpatRasterDataSet that can have 1 additional dimension ("group", "variable"), in the future I may allow for more dimensions (nested variables). The SpatRasterDataSet is mostly for reading data from files that have groups, and for combining SpatRasters for computations where each multi-layer SpatRaster is a variable. In the context of tmap I would not worry about these at this point in time.
For downsampling, terra uses spatSample
(see Downsample function? rspatial/terra#330). Occasionally, that can of course lead to showing undesired outliers. But it is much faster than aggregate for large datasets. I suppose that there could be an option to use aggregate instead, but it might be better to leave aggregation to the user as a pre-processing step, so that they can also decide whether to use mean, sum, median, na.rm=T, etc. Mapping count data can become much more meaningful when you can actually see all the raster cells, but whether someone wants to show the summed or the sampled data will depend on the context.
@rhijmans it seems that e.g.
spatSample(landsat_terra, 1e4, as.raster=TRUE)
doesn't return a very useful raster; when plotted:
@edzer it should work better with regular sampling:
x <- spatSample(landsat_terra, 1e4, method="regular", as.raster=TRUE)
Thanks!