terra Conduct focal over part of a SpatRaster [FEATURE REQUEST]

I'm working on parallelizing focal operations by breaking large rasters into smaller chunk/tiles by row and then sending out each chunk to a different core (ailich/GLCMTextures#25). However, since focal operations require the context of surrounding cells, the first/last row of a chunk that I want to calculate values for requires information from the rows above/below it (I'll call these buffer rows). Conducting focal on each of these chunks, trimming them accordingly and merging them back together works; however, since focal goes through all the cells in a raster, the focal calculations are conducted on these buffer rows that themselves do not have the needed context for focal calculations. These are later overwritten by values calculated with the full focal context when merging chunks since there is overlap between chunks. Since these values will be overwritten, there is no need to calculate them and with larger window sizes, more chunks/cores, and more columns, this added computational burden increases. I noticed focalValues you can specify where to start and and end extracting values via row and nrow. I was wondering if something similar could be implemented in focalCpp and focal where focal operations would only be conducted for a portion of the raster and things beyond the specified range would just be filled with NA or a specified fill value.

Jan 02 '24 17:01 ailich

One way to achieve that could perhaps be by setting windows. I have added a new function getTileExtents that gives you a set of windows (extents). You can use argument "buffer" to add additional rows and columns. Your output would then be tiles that can be combined with vrt. I am trying to do something generic, rather then something specific for focal/focalCpp, if possible.

Jan 30 '24 15:01 rhijmans

Thanks @rhijmans, that definitely makes creating the tiles easier, though a bit more description what y does based on the input type would be useful, and potentially being able to set the desired row and columns as a vector of length two rather than a single number. I'm not sure I'm fully understanding but I believe with this method the areas of overlap will still be processed twice which adds computation time. Additionally, the tiles will have to be trimmed back before merging or else that results in striping artifacts. If buffer cells are not evaluated in focal and are filled with NA merge can be used without trimming since it will always prioritize non-NA values.

library(terra)
#> Warning: package 'terra' was built under R version 4.3.2
#> terra 1.7.70
r <- rast(ncols=100, nrows=100)
values(r) <- 1:ncell(r)
x <- rast(ncols=10, nrows=10)
w<- c(3,3) # window size
filename <- paste0(tempfile(), "_.tif")
ff <- makeTiles(r, x, buffer= (w[1]-1)/2, filename)

fname<- vector(mode = "character", length= length(ff))
for (i in 1:length(ff)) {
  fname[i]<- tempfile(fileext = ".tif")
  focal(rast(ff[i]), filename=fname[i], fun=mean, na.rm=TRUE)
}

z_ref<- focal(r, fun=mean, na.rm=TRUE) #Reference of correct surface
z_vrt<- vrt(fname) #Combine with vrt
z_merge<- do.call(merge, lapply(fname, rast)) #Combine with merge

plot(z_ref-z_vrt, main = "Reference - VRT")

plot(z_ref-z_merge, main = "Reference - Merge")

plot(z_merge-z_vrt, main="Merge - VRT")

^{Created on 2024-01-30 with reprex v2.0.2}

Jan 30 '24 21:01 ailich

a bit more description what y does based on the input type would be useful

I have added a little bit

being able to set the desired row and columns as a vector of length two rather than a single number

That was already possible for both y and buffer

I believe with this method the areas of overlap will still be processed twice which adds computation time.

That is correct

Additionally, the tiles will have to be trimmed back before merging or else that results in striping artifacts. If buffer cells are not evaluated in focal and are filled with NA merge can be used without trimming since it will always prioritize non-NA values.

Yes, but it would easy to use crop on the tile extents created the same way, but with buffer=0

Jan 31 '24 06:01 rhijmans

It would also be possible to add a "cores" argument to focal (like in e.g. app).

I think that would also be possible for focalCpp. You would have to provide the uncompiled version of the cpp function so that it can be compiled on each node.

Jan 31 '24 06:01 rhijmans

Thanks @rhijmans. Using crop and buffer=0 seems to mostly work. Getting some issues on the left and right side of the raster.

library(terra)
#> Warning: package 'terra' was built under R version 4.3.2
#> terra 1.7.71

r <- rast(ncols=100, nrows=100)
values(r)<- 1:ncell(r)
x <- rast(ncols=10, nrows=10)
w<- c(3,3) # window size
ff <- makeTiles(r, x, buffer= (w-1)/2, filename= paste0(tempfile(), "_.tif"))
tile_exts_names<- makeTiles(r, x, buffer= 0, filename= paste0(tempfile(), "_.tif"))
tile_exts<- lapply(lapply(tile_exts_names, rast), ext)
invisible(file.remove(tile_exts_names)) #Clear up disk space because only need extents of these rasters

fname<- vector(mode = "character", length= length(ff))
for (i in 1:length(ff)) {
  fname[i]<- tempfile(fileext = ".tif")
  crop(focal(rast(ff[i]), fun=mean, na.rm=TRUE), y = tile_exts[[i]], filename=fname[i])
}

z_ref<- focal(r, fun=mean, na.rm=TRUE) #Reference of correct surface

z_vrt<- vrt(fname) #Combine with vrt
z_merge<- do.call(merge, lapply(fname, rast)) #Combine with merge

plot(z_ref!=z_vrt, main = "Reference vs VRT")

plot(z_ref!=z_merge, main = "Reference vs Merge")

^{Created on 2024-01-31 with reprex v2.0.2}

Jan 31 '24 16:01 ailich

@rhijmans, having cores built into focal and focalCpp would be excellent if possible! I'm not sure about the internals of those functions but I suspect they call focalValues. If so, if tiled by row then using focalValues(tile, row=buffer+1, nrow=nrow(tile)-2*buffer) could be used to avoid performing focal operations twice in the buffer areas. This would apply for all interior tiles. The top would need to start at the top using row=1 and the end tile would need to end at the bottom using nrow= nrow(tile)-buffer.

library(terra)
#> Warning: package 'terra' was built under R version 4.3.2
#> terra 1.7.71

r <- rast(ncols=99, nrows=99)
values(r)<- rep(1:nrow(r), each=ncol(r)) #Values correspond to row numbers
w<-c(3,3)
buffer<- (w[1]-1)/2
ff <- makeTiles(r, c(33,ncol(r)), buffer= (w[1]-1)/2, filename= paste0(tempfile(), "_.tif"))
length(ff) # 3 tiles
#> [1] 3
#With no buffer, each tile should be 33 rows

tile2<- rast(ff[2]) #Middle tile

fv<- focalValues(tile2, row=buffer+1, nrow= nrow(tile2)-(2*buffer)) # Start at row after buffer on top, and number of rows based on total in tile minus top and bottom buffer
cv<- fv[,ceiling(prod(w)/2)] #Center value of each window
min(cv) # Start from row 34 b/c tile 1 goes 
#> [1] 34
max(cv) # End at row 66 b/c tile 3 starts at 67
#> [1] 66

^{Created on 2024-01-31 with reprex v2.0.2}

Jan 31 '24 17:01 ailich

This would require a bit of an overhaul to focal/focalCpp but the issue of overlap could be eliminated by focal being redefined such that it first creates a new SpatRaster object where each layer is an element in the focal window and then it is just a call to app. Then since each cell has all the information it needs, you could simply pass the values and function to app and tiling could be done without any buffer. Not sure how this would affect performance though.

library(terra)
#> Warning: package 'terra' was built under R version 4.3.2
#> terra 1.7.71

r <- rast(ncols=100, nrows=100)
values(r)<- 1:ncell(r) #Values correspond to row numbers
w<-c(3,3)

focal_rast<- rast(r) #Initialize raster
nlyr(focal_rast)<- prod(w) # Make number of layers equal to number of elements in focal window
fv<- focalValues(r) # Get focal values (not memory safe)
values(focal_rast)<- fv # Each cell contains values from focal window corresponding to that cell in original raster
fv[1,]
#> [1]  NA  NA  NA 100   1   2 200 101 102
focal_rast[1,1,] #Values across layers correspnd to first row of fv
#>   lyr.1 lyr.2 lyr.3 lyr.4 lyr.5 lyr.6 lyr.7 lyr.8 lyr.9
#> 1    NA    NA    NA   100     1     2   200   101   102

z_ref<- focal(r, w=w, fun=mean, na.rm=TRUE) #Reference of correct surface
z_app<- app(focal_rast, fun="mean", na.rm=TRUE)

plot(z_ref==z_app)


# Tile focal_rast
x <- rast(ncols=10, nrows=10)
ff <- makeTiles(focal_rast, x, buffer= 0, filename= paste0(tempfile(), "_.tif")) #Tile focal_rast with no buffer because cells include the information needed across layers

fname<- vector(mode = "character", length= length(ff))
for (i in 1:length(ff)) {
  fname[i]<- tempfile(fileext = ".tif")
  app(rast(ff[i]), fun="mean", na.rm=TRUE, filename=fname[i]) # apply function using app
}
z_app_vrt<- vrt(fname) #Combine with vrt
z_app_merge<- do.call(merge, lapply(fname, rast)) #Combine with merge

plot(z_app_vrt!=z_ref)

plot(z_app_merge!=z_ref)

^{Created on 2024-01-31 with reprex v2.0.2}

A simple version of this wrapped into a function would be

focal2<- function(x,w,fun, ...){
  focal_rast<- rast(x) #Initialize raster
  nlyr(focal_rast)<- prod(w) # Make number of layers equal to number of elements in focal window
  fv<- focalValues(r) # Get focal values (not memory safe)
  values(focal_rast)<- fv # Each cell contains values from focal window corresponding to that cell in original raster
  out<- app(focal_rast, fun=fun,...)
  return(out)
} #Alternative focal function

Jan 31 '24 19:01 ailich

@rhijmans, perhaps a more general option would be to allow for focal and focalCpp to take a SpatRaster of 1s and 0's indicating which cells in the raster to iterate over. In addition to allowing for calculations to be skipped in areas of overlap while chunking rasters, this would also speed up calculations for non-rectangular data where much of the raster grid can be filled with NA's.

library(terra)
#> Warning: package 'terra' was built under R version 4.3.2
#> terra 1.7.71

r<- rast(volcano, 
         extent= ext(2667400, 2667400 + ncol(volcano)*10, 
                     6478700, 6478700 + nrow(volcano)*10), 
         crs = "EPSG:27200")

AOI_shp<- vect(structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2667400.28, 2667770.32, 
                 2667850.36, 2667990.72, 2667401.44, 2667400.28, 6479302.96, 6479569.76, 
                 6479268.16, 6478699.76, 6478891.16, 6479302.96, 0, 0, 0, 0, 0, 
                 0), dim = 6:5, dimnames = list(NULL, c("geom", "part", "x", "y", 
                                                        "hole"))), type="polygons", crs=crs(r))
r<- mask(r, AOI_shp)
r<- crop(r, AOI_shp)
plot(r)


# 37 percent of cells are NA
round(100*(freq(r, value=NA)$count/ncell(r)))
#> [1] 37

AOI<- rast(r, vals=1)
AOI<- mask(AOI, AOI_shp, updatevalue=0)
plot(AOI) #Maybe focal/focalCpp could use this as an index band to know which cells to calculate over and which ones to skip

^{Created on 2024-03-19 with reprex v2.0.2}

Mar 19 '24 15:03 ailich

terra terra copied to clipboard

Conduct focal over part of a SpatRaster [FEATURE REQUEST]

terra
terra copied to clipboard