mlr3spatial icon indicating copy to clipboard operation
mlr3spatial copied to clipboard

predict_spatial results in an error when the predictor raster stack has layers with similar names

Open mcoghill opened this issue 1 year ago • 0 comments
trafficstars

When trying to use the predict_spatial function, if there are rasters with similar names in the stack of predictor raster layers an error is reported and the function fails. Here is a reprex:

library(mlr3spatial)
library(terra, exclude = "resample")

# fit rpart on training points
task_train = tsk("leipzig")
learner = lrn("classif.rpart")
learner$train(task_train)

# load raster
stack = rast(system.file("extdata", "leipzig_raster.tif", package = "mlr3spatial"))

# add two layers with similar layer names
r <- rast(nrows = nrow(stack), ncols = ncol(stack),
          nlyrs = 2, crs = crs(stack), extent = ext(stack), 
          resolution = res(stack), names = c("test_name", "test_name2"))
values(r[[1]]) <- runif(ncell(stack))
values(r[[2]]) <- runif(ncell(stack))

stack <- c(stack, r)

# predict land cover classes
pred = predict_spatial(stack, learner, chunksize = 1L)

Error in if (terra::inMemory(data[layer])) { : 
  the condition has length > 1

This error can be tracked down to the initialization of the DataBackendRaster where the rasters are written to disk if they are in memory. Here, there are single square brackets used when trying to subset the raster by name when two square brackets are required to do so. According to the terra documentation, the double square brackets are meant to be used for subsetting while single square brackets are used for extracting (see ?terra::`[ and ?terra::`[[). This can be demonstrated when attempting to perform those operations here:

> stack["test_name"]

class       : SpatRaster 
dimensions  : 206, 154, 2  (nrow, ncol, nlyr)
resolution  : 10, 10  (x, y)
extent      : 731810, 733350, 5692030, 5694090  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 32N (EPSG:32632) 
source(s)   : memory
names       :    test_name,   test_name2 
min values  : 1.429836e-05, 4.431047e-05 
max values  : 9.999822e-01, 9.999146e-01 

You can see here that both layers (test_name and test_name2) have been subset when a single square bracket is used. Compare this to using double square brackets:

> stack[["test_name"]]

class       : SpatRaster 
dimensions  : 206, 154, 1  (nrow, ncol, nlyr)
resolution  : 10, 10  (x, y)
extent      : 731810, 733350, 5692030, 5694090  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 32N (EPSG:32632) 
source(s)   : memory
name        :    test_name 
min value   : 1.429836e-05 
max value   : 9.999822e-01 

Using double square brackets will properly subset layers in this case that have similar names to each other. I'm hoping that this can be simply implemented without causing further issues downstream - I'm not familiar enough with the mlr3spatial and mlr3verse package structures to know what that answer might be but hopefully it's just as simple as changing single for square brackets at this position. Thank you!

mcoghill avatar Aug 05 '24 15:08 mcoghill