terra icon indicating copy to clipboard operation
terra copied to clipboard

Run in parallel?

Open liushlcn opened this issue 4 years ago • 23 comments

Is it possible to run in parallel? I have tried to run in parallel for calculation, I faced with the error message "NULL value passed as symbol address"

liushlcn avatar May 19 '20 00:05 liushlcn

I think you cannot send SpatRaster or SpatVector objects to nodes as these objects cannot be serialized. So you need to to create the objects in the nodes. My plan it to make this all easy by providing built-in support for parallelization but that is not the highest priority right now; but I expect to add that this year.

rhijmans avatar May 19 '20 01:05 rhijmans

I have been running some code in parallel using a combination of both terra and raster packages: pre- and post- processing with terra and parallel compute with raster. The whole code would run faster if it was fully on terra and I am adding my voice to the wish for addition of built-in parallelization support.

hrvg avatar Jun 10 '20 20:06 hrvg

Which terra functions would be your priority?

rhijmans avatar Jun 10 '20 21:06 rhijmans

I am mainly using terra::aggregate(fun = mean), terra::aggregate(fun = sd) and terra::crop() to derive statistical roughness metrics. I have worked around using raster, foreach and doParallel.

hrvg avatar Jun 10 '20 21:06 hrvg

I seem to be having the same issue with terra::predict not working in parallel, same error of NULL value passed as symbol address, runs fine otherwise. Built in support for parallelisation of the predict function would be awesome if you could.

clairedavies avatar Dec 08 '20 00:12 clairedavies

See https://github.com/rspatial/terra/issues/96 for built in support for predict.

You can use also pack and unpack to make serializable SpatRaster and SpatVector objects (may not work for more complex structures, but I think it works for the basic ones

library(terra)
s <- rast(system.file("ex/logo.tif", package="terra"))   
p <- pack(s)
p
#[1] "This is a PackedSpatRaster object. Use 'rast()' to unpack it"

# send somewhere and unpack
x <- rast(p)
x
#class       : SpatRaster 
#dimensions  : 77, 101, 3  (nrow, ncol, nlyr)
#resolution  : 1, 1  (x, y)
#extent      : 0, 101, 0, 77  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=merc +lon_0=0 +k=1 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs 
#source      : memory 
#names       : lyr.1, lyr.2, lyr.3 
#min values  :     0,     0,     0 
#max values  :   255,   255,   255 

This will load all values into memory, and I am thinking of a method that will do automatic tiling.

if you have multiple files, you can of course parallelize over filenames (create the objects on the cores)

The C++ level support will not come very soon --- perhaps sometime next year. In part because this requires C++14 and I am not sure if R is quite there yet.

rhijmans avatar Dec 08 '20 04:12 rhijmans

Awesome thanks, will give that a go,

From: Robert Hijmans [email protected] Sent: Tuesday, 8 December 2020 3:40 PM To: rspatial/terra [email protected] Cc: Davies, Claire (O&A, Hobart) [email protected]; Comment [email protected] Subject: Re: [rspatial/terra] Run in parallel? (#36)

See #96https://github.com/rspatial/terra/issues/96 for built in support for predict.

You can use also pack and unpack to make serializable SpatRaster and SpatVector objects (may not work for more complex structures, but I think it works for the basic ones

library(terra)

s <- rast(system.file("ex/logo.tif", package="terra"))

p <- pack(p)

send somewhere

x <- unpack(p)

This will load all values into memory, and I am thinking of a method that will do automatic tiling.

if you have multiple files, you can of course parallelize over filenames (create the objects on the cores)

The C++ level support will not come very soon --- perhaps sometime next year. In part because this requires C++14 and I am not sure if R is quite there yet.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/rspatial/terra/issues/36#issuecomment-740370734, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFBCV5W2GQEWUTXCLGHABY3STWU25ANCNFSM4NERKWOQ.

clairedavies avatar Dec 08 '20 04:12 clairedavies

more examples for parallelization of terra::predict: https://github.com/rspatial/terra/issues/178

rhijmans avatar Mar 21 '21 18:03 rhijmans

For future reference pack() has been renamed to wrap() since version 1.2-5. https://rspatial.github.io/terra/reference/wrap.html

rCarto avatar Jul 20 '21 10:07 rCarto

I'd like to add "extract" to the wish-list for future parallelization;

In the past, using raster::extract and "mcapply" (from the parallel library), I could extract values for a vector of coordinates (mycoords), each from a specified layer of a raster stack (as defined in the vector mylayers) across many cores.

myoutput<-unlist(mclapply(1:length(mycoords), function(i){extract(mybrick, mycoords[i], layer=mylayers[i], nl=1)}, mc.cores=ncores))

While executed on a single core, "terra::extract" represents a big speed gain over raster::extract, but it still gets bogged down when you have a lot of layers (i.e., 10000+).... making the parallelized "raster::extract" with mclapply considerably faster (assuming the use of several cores). Perhaps there is a similar way to approach parallelization for terra::extract?

chrishaak avatar Jan 27 '22 18:01 chrishaak

I have written a satellite processing chain using raster package and parallelized the process to run a crop model over many fields, I have updated all the processes to be compatible with terra package, and damned my processes do not work now in parallel mode. Anyway I lost a lot of time but should have search before about the fact that spatRaster do not appreciate parallel mode, I have now to go back to raster package because my PC has 24 cores and even if terra is faster, 24 cores will be more efficient.

xbailleau avatar Feb 17 '22 19:02 xbailleau

It is difficult to comment since I do not know how you implemented this. But in your case I do not see much of a problem. For example, you can send a chunk of weather data (matrix or data.frame) to each core and get the model results back. For example by parallelizing the function you use with app.

rhijmans avatar Feb 17 '22 20:02 rhijmans

More here: https://stackoverflow.com/a/67449818/635245

rhijmans avatar Feb 17 '22 23:02 rhijmans

Thank you very much for your answer and for sharing your packages, in the different cores, I do all the process (reading the jp2 files, calculate LAI, assimilate LAI in the crop model and generate yield map) using 24 cores, I can run 24 fields at once. I have now two possibilities (using raster and terra) I now add the way to use stars package. May be there would exist an other way to process but the code is built like that and runs fine using raster package, rebuild all the process would be quite lot of time to spent. I will check your link to stackflow, thx

xbailleau avatar Feb 18 '22 13:02 xbailleau

It would be great having some examples of running terra in parallel for those functions in which it is possible already. Perhaps even some examples in https://rspatial.org/terra/spatial/index.html

aloboa avatar Mar 25 '22 15:03 aloboa

I would also like to add focal and focalCpp to the wishlist since focal operations can be computationally expensive (and I use them a lot 😅).

ailich avatar Jun 03 '22 02:06 ailich

I would also like to add focal and focalCpp to the wishlist since focal operations can be computationally expensive (and I use them a lot sweat_smile).

This would be very beneficial!

bitbacchus avatar Jul 14 '22 10:07 bitbacchus

and zonal()

aloboa avatar Jul 25 '22 16:07 aloboa

I'd like to throw distance into the ring as well - feels like something that should be parallel by design, as it seems like the underlying processing lends itself to being embarassingly parallel? (compute the same distance calculation on every pixel one by one)

SimonDedman avatar Jul 23 '23 20:07 SimonDedman

I want to suggest the default value for cores=getOption("mc.cores"). This is pretty standard in the parallel package and elsewhere in the R ecosystem.

dmi3kno avatar Sep 04 '23 09:09 dmi3kno

I tried breaking rasters into chunks and using a parallelized version of lapply from the future.apply package. The idea here is breaking the raster into chunks and sending each chunk to a core, and then merging the results back together into a single object. The code works in serial with a standard lapply but I get an error in parallel. Based on the error message it seems like the it is potentially related to SpatRasters being C++ pointers to the data. Would it be possible to get something like this working, or is there any guidance on how to implement parallel processing in terra?

library(terra)
#> terra 1.7.47
library(future)
library(future.apply)
library(dplyr)

# Set up
set.seed(5)
r<-rast(matrix(data = sample(1:100, size = 1000, replace = TRUE), nrow = 100, ncol=100))
ncores<- 5
w<- c(3,3)
plot(r)


# Create breaks for chunking raster
buffer<- (w[1]-1)/2 # Buffer for focal window
breaks<- data.frame(start = ceiling(seq(1, nrow(r), length.out = ncores+1)))
breaks<- breaks %>% mutate(end=lead(start, n=1)+buffer)
breaks$end[breaks$end > nrow(r)]<- nrow(r)
breaks<- breaks[-nrow(breaks),]

# Put raster chunks in list
r_list<- vector(mode = "list", length = ncores)
for (i in 1:ncores) {
  r_list[[i]]<- r[breaks$start[i]:breaks$end[i], ,drop=FALSE]
}

# In serial
res_list<- lapply(r_list, FUN = focal, fun= "mean", na.rm=FALSE) #Apply focal to each list element
res<- do.call(terra::merge, res_list) #Merge back into single raster
plot(res)


#In Parallel
plan(strategy = "multisession", workers=ncores) #Set up parallel
res_list2<- future_lapply(r_list, FUN = focal, fun= "mean", w=w, na.rm=FALSE)
#> Error in .External(structure(list(name = "CppMethod__invoke_notvoid", : NULL value passed as symbol address

Created on 2023-11-14 with reprex v2.0.2

ailich avatar Nov 14 '23 19:11 ailich

I saw that earlier in the thread wrap and unwrap were mentioned. I can get parallelization working if I wrap each chunk before sending it out to a core and wrapping the output from each core. Then those need to each be unwrapped and merged. Here's an example with focal

library(terra)
#> terra 1.7.62
library(future)
library(future.apply)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:terra':
#> 
#>     intersect, union
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Custom Function
combine_raster_chunks<- function(x, buffer){
  for (i in 1:length(x)) {
    if(i!=1){x[[i]]<- x[[i]][(buffer+1):nrow(x[[i]]), , , drop=FALSE]} #Trim top of chunk
    if(i!=length(x)){x[[i]]<- x[[i]][1:(nrow(x[[i]])-buffer), , , drop=FALSE]} #Trim bottom of chunk
  }
  out<- do.call(merge, x)
  return(out)
}

# Set up
set.seed(5)
r<-rast(matrix(data = sample(1:100, size = 1000, replace = TRUE), nrow = 100, ncol=100))
ncores<- 5
w<- c(7,7)
plot(r)


# Create breaks for chunking raster
buffer<- (w[1]-1)/2 # Buffer for focal window
breaks<- data.frame(write_start = ceiling(seq(1, nrow(r)+1, length.out = ncores+1)))
breaks<- breaks %>% mutate(write_end=lead(write_start, n=1)-1)
breaks<- breaks %>% mutate(chunk_start=write_start - buffer)
breaks<- breaks %>% mutate(chunk_end = write_end + buffer)
breaks<- breaks[-nrow(breaks),]
breaks$chunk_end[breaks$chunk_end > nrow(r)]<- nrow(r)
breaks$chunk_start[breaks$chunk_start < 1]<- 1

# Put raster chunks in list
r_list<- vector(mode = "list", length = ncores)
for (i in 1:ncores) {
  r_list[[i]]<- wrap(r[breaks$chunk_start[i]:breaks$chunk_end[i], ,drop=FALSE])
} #SpatRasters need to be wrapped before sending out to different cores

focal_parallel<- function(x, ...){
  wrap(focal(unwrap(x),...)) #unwrap for processing and then wrap output
}


#In Parallel
plan(strategy = "multisession", workers=ncores) #Set up parallel
out_list<- future_lapply(r_list, FUN = focal_parallel, fun= "mean", w=w, na.rm=TRUE)
plan(strategy = "sequential")
out_list<- lapply(out_list, unwrap) #unwrap chunks
parallel<- combine_raster_chunks(out_list, buffer) #merge chunks into single raster
plot(parallel)


#In serial
serial<- focal(r, w=w, fun="mean")

#Comparison
serial-parallel
#> class       : SpatRaster 
#> dimensions  : 100, 100, 1  (nrow, ncol, nlyr)
#> resolution  : 1, 1  (x, y)
#> extent      : 0, 100, 0, 100  (xmin, xmax, ymin, ymax)
#> coord. ref. :  
#> source(s)   : memory
#> name        : focal_mean 
#> min value   :          0 
#> max value   :          0

Created on 2023-12-08 with reprex v2.0.2

ailich avatar Nov 15 '23 17:11 ailich