terra icon indicating copy to clipboard operation
terra copied to clipboard

vrt: Add support for `set_names` with `options="-separate"`

Open brownag opened this issue 5 months ago • 1 comments

Hi @rhijmans!

This PR adds to vrt(<character>) support for set_names with options="-separate".

  • The existing logic for set_names works great for tiled datasets, but I think for the overlapping case it can be altered.
  • The PR is mainly to demonstrate a behavior that would be convenient for a few more situations. Feel free to close and implement it differently or not at all

Thank you for your consideration.

With this PR we would get:

library(terra)
#> terra 1.8.56

x <- rast(system.file("ex", "elev.tif", package = "terra"))

# create 3 band dataset (each band different source)
x2 <- c(x, x * 2, x * 3)
names(x2) <- LETTERS[1:3]
x3 <- rast(lapply(names(x2), function(n) {
    writeRaster(x2[[n]], tempfile(fileext = ".tif"))
}))

# now works
res1 <- vrt(sources(x3), options = "-separate", set_names = TRUE)
res1
#> class       : SpatRaster 
#> size        : 90, 95, 3  (nrow, ncol, nlyr)
#> resolution  : 0.008333333, 0.008333333  (x, y)
#> extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : spat_d9521ace7116_55634_o8up7yuDdJ6XHBX.vrt 
#> names       :   A,    B,    C 
#> min values  : 141,  282,  423 
#> max values  : 547, 1094, 1641

# still works
x4 <- makeTiles(x3, 10)
res2 <- vrt(x4, set_names = TRUE)
res2
#> class       : SpatRaster 
#> size        : 90, 95, 3  (nrow, ncol, nlyr)
#> resolution  : 0.008333333, 0.008333333  (x, y)
#> extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : spat_d952bed8f77_55634_zxzxx9jyyx6UnnI.vrt 
#> names       :   A,   B,   C 
#> min values  : NaN, NaN, NaN 
#> max values  : NaN, NaN, NaN

# values match (excluding NA vs. NaN)
all(values(res1) == values(res2), na.rm = TRUE)
#> [1] TRUE

Previous behavior:

library(terra)
#> terra 1.8.56

x <- rast(system.file("ex", "elev.tif", package = "terra"))

# create 3 band dataset (each band different source)
x2 <- c(x, x * 2, x * 3)
names(x2) <- LETTERS[1:3]

x3 <- rast(lapply(names(x2), function(n) {
    writeRaster(x2[[n]], tempfile(fileext = ".tif"))
}))

# names not preserved
res1 <- vrt(sources(x3), options = "-separate", set_names = TRUE)
res1
#> class       : SpatRaster 
#> size        : 90, 95, 3  (nrow, ncol, nlyr)
#> resolution  : 0.008333333, 0.008333333  (x, y)
#> extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : spat_d38454d06bd6_54148_o8up7yuDdJ6XHBX.vrt 
#> names       : spat_d3845~DdJ6XHBX_1, spat_d3845~DdJ6XHBX_2, spat_d3845~DdJ6XHBX_3 
#> min values  :                   141,                   282,                   423 
#> max values  :                   547,                  1094,                  1641

# works (returns nodata as NaN)
x4 <- makeTiles(x3, 10)
res2 <- vrt(x4, set_names = TRUE)
res2
#> class       : SpatRaster 
#> size        : 90, 95, 3  (nrow, ncol, nlyr)
#> resolution  : 0.008333333, 0.008333333  (x, y)
#> extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : spat_d38411fd7cec_54148_zxzxx9jyyx6UnnI.vrt 
#> names       :   A,   B,   C 
#> min values  : NaN, NaN, NaN 
#> max values  : NaN, NaN, NaN


# values match (excluding NA vs. NaN)
all(values(res1) == values(res2), na.rm = TRUE)
#> [1] TRUE

brownag avatar Jun 12 '25 05:06 brownag

I thought about this a little bit more, and I think it may be better to use sprc() rather than rast() in the "-separate" (overlapping) case.

While overlapping grid VRT is probably most well behaved when bands have the same grid/extent/etc... it is not strictly required for VRT. So, the rast() usage in https://github.com/rspatial/terra/pull/1850/commits/28bd9bf03f4e38ea371fe48f02f2d23eebed571e is a little bit limiting. In https://github.com/rspatial/terra/pull/1850/commits/1b63ef4cf39203988a91e0cd1a780fd3845c2d2b I have replaced names(rast(x)) with unlist(lapply(sprc(x), names))

One thing to consider here is that vrt() always returns a SpatRaster... while this appears to work I think that output it is misleading when the sources actually do have different grids, as there is no way to accomodate different resolution/CRS/extent in the SpatRaster.

Perhaps it should be rast() to limit users to a more well-behaved subset of VRT possibilities (though as written this limit would only apply when set_names=TRUE)

brownag avatar Jun 16 '25 17:06 brownag

Thanks. I have not had time to play with this and I am not sure whether it is good or bad to get a SpatRaster with varying underlying geometries. Perhaps it is good as that is what the VRT creates? Would a warning be appropriate?

rhijmans avatar Jun 22 '25 18:06 rhijmans

Thanks very much @rhijmans. This fixes the case that I had in mind where the layers are perfectly overlapping with same grid and SRS.

I think it makes sense to use sprc() in this PR because we are just iterating over sources to extract names and do not need them to have a consistent grid system in order to do that. Using rast() is a bit more succinct but would cause an error due to any mismatches of extent.

For the broader implications, consider this example. In short, vrt() always creating a SpatRaster, as it currently does, makes sense.

Here I create 3 layers:

  1. Original
  2. 10x coarser resolution (same SRS)
  3. Different SRS

GDAL will throw an internal error in the event of different projection but allows for different grids. If I pass the above three sources to sf::gdal_utils("buildvrt") I get a warning related to the inclusion of the layer with the different SRS. Similarly with terra::vrt() I get the same result (only bands 1 and 2), and a warning that one source was not used.

We can write a VRT with source files that have a SRS in common, but when we read the VRT back in, we get different resolutions than we had for the source grids. They have been harmonized to a consistent value. If we inspect the XML we can see how it gets worked out.

In this example the layer corresponding to x appears to get scaled by a factor of ~5.5 (0.008333333 -> 0.04583333), and the layer corresponding to y get scaled by a factor of ~0.5 (0.08333333 -> 0.04583333). Thus, the layers of the VRT have the same resolution, whereas the layers in the source files did not. The VRT representation is compatible with SpatRaster, whereas referencing the source files themselves would need a SpatRasterCollection.

library(terra)
#> Warning: package 'terra' was built under R version 4.5.1
#> terra 1.8.56
terra::libVersion()
#>     gdal     proj     geos 
#> "3.11.0"  "9.6.0" "3.13.1"

x <- rast(system.file("ex", "elev.tif", package = "terra"))

xt <- rast(x)
res(xt) <- res(x)*10
y <- project(x, xt)

z <- project(x, "+proj=igh")

x2 <- list(x, y, z)
names(x2) <- LETTERS[1:3]

x3 <- sapply(names(x2), function(n) {
  sources(writeRaster(x2[[n]], tempfile(fileext = ".tif")))
})

sf::gdal_utils("buildvrt", x3, "test-sf.vrt", options = "-separate")
#> Warning in CPL_gdalbuildvrt(if (missing(source)) character(0) else source, :
#> GDAL Message 1: gdalbuildvrt does not support heterogeneous projection:
#> expected WGS 84, got unknown. Skipping
#> C:\Users\ANDREW~1.BRO\AppData\Local\Temp\RtmpGuy8Eg\file789870227a09.tif

res1 <- vrt(x3, "test.vrt", options = "-separate", overwrite = TRUE)
#> Warning: [vrt] vrt did not use 1 of the 3 files
res1
#> class       : SpatRaster 
#> size        : 16, 18, 2  (nrow, ncol, nlyr)
#> resolution  : 0.04583333, 0.04583333  (x, y)
#> extent      : 5.741667, 6.566667, 49.45833, 50.19167  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source      : test.vrt 
#> names       : test_1,   test_2 
#> min values  :     ? , 264.7985 
#> max values  :     ? , 490.1002

plot(res1)


res(res1)
#> [1] 0.04583333 0.04583333

res(rast(x3[1]))
#> [1] 0.008333333 0.008333333
res(res1[[1]])
#> [1] 0.04583333 0.04583333

res(rast(x3[2]))
#> [1] 0.08333333 0.08333333
res(res1[[2]])
#> [1] 0.04583333 0.04583333

In the event that two overlapping sources use a different grid system, it should be expected for the final VRT to have a third grid system which compromises the two sources.

Now if one were to build a VRT independently, without using the gdalbuildvrt machinery, I suppose it is possible to create something that does not follow the above scheme. But that kind of off-label use is pretty hard to predict and work around.

brownag avatar Jun 23 '25 23:06 brownag