terra icon indicating copy to clipboard operation
terra copied to clipboard

Memory problems processing folders of files from list

Open thomarse-ef opened this issue 2 years ago • 3 comments

I'm trying to process a few folders of spatial data but am running into memory issues (filling RAM very quickly, often causing a crash). I've tried to replicate this behaviour in a few examples.

Example 1 - merge folder of rasters:

# make example raster
x = 1000
y <- rast(nrow=x, ncol=x, res=1, vals = sample(0:1, x, replace = TRUE))
writeRaster(y, "y.tif", overwrite = TRUE)
# make list (to emulate list of folder contents)
r.list <- rep("y.tif", 4000)
# read in
q <- lapply(r.list, rast)
# merge
q <- do.call(merge, q) # uses a lot of memory (much more with my real data)

Example 2 - merge folder of shapefiles:

# load vector
s <- system.file("ex/lux.shp", package="terra")   
# make list
s.list <- rep(s, 4000)
# make empty SpatVector
v <- vect()
# loop
for(w in 1:length(s.list)){
  # add to SpatVector
  v <- rbind(v, vect(s.list[[w]]))
} # fills memory

Example 3 - polygonise and merge folder of rasters:

# make empty SpatVector
g <- vect()
# loop
for(d in 1:length(r.list)){
  # load
  f <-  rast(r.list[[d]])
  # polygonise
  f <-  as.polygons(f)
  # add to SpatVector
  g <- rbind(g, f) 
} # fills memory

thomarse-ef avatar Apr 06 '22 14:04 thomarse-ef

Please report one case per issue (although 2 and 3 are probably the same), otherwise it becomes more difficult to deal with them. For now:

  1. two alternative approaches
# make example raster
x = 100
y <- rast(nrow=x, ncol=x, res=1, vals = 1)
writeRaster(y, "y.tif", overwrite = TRUE)
r.list <- rep("y.tif", 4)

# A
q <- lapply(r.list, rast)
q <- sprc(q)
m <- merge(q)

# or B
v <- vrt(r.list)
m <- writeRaster(v, "out.tif")

2 & 3) instead make a list of SpatVector objects and call vect with that.

s <- system.file("ex/lux.shp", package="terra")   
s.list <- rep(s, 4)
v <- lapply(s.list, vect)
v <- vect(v)

rhijmans avatar Apr 06 '22 18:04 rhijmans

Sorry for triple posting, I assumed the issue was for all three was linked. I had found a workaround for all three issues but thought I'd post them here as it seemed like unexpected behaviour. In example 1 I was trying to process a 5MB folder of files, it was using 2GB of RAM, example 2 was a 45MB folder of shapefiles, using 12GB RAM. I thought this seemed like there might be something wrong, hence posting here.

Thanks very much for the suggestions. Some feedback:

# Example 1
# A
q <- lapply(r.list, rast)
q <- sprc(q)
m <- merge(q)  ## this fills memory and crashes

# or B
v <- vrt(r.list) ## this works beautifully
m <- writeRaster(v, "out.tif")

Example 2 solution works great as well. Having computor issues at the moment but I think a combination of vrt() and example 2 solution should work for example 3. Thanks very much.

thomarse-ef avatar Apr 07 '22 16:04 thomarse-ef

OK I can't get this to work for example 3. The objective here is to load a folder of rasters, process them, convert to polygons and save as one shapefile. I've tried using vrt() to load the raster in per your suggestion, which works great but I can't then process them:

r <- vrt(r.list)
r <- classify(r, cbind(-Inf, 0.5, NA), right=FALSE)

I get the Error: [classify] insufficient disk space (perhaps from temporary files?). The folder of rasters is apx. 400MB in total. I would have thought the loop approach I originally gave would have been a memory safe way of doing this (and works fine if I use sf to save the shapefile i.e. making an empty sf object at the start and changing the last line to g <- rbind(g, st_as_sf(f))). Why does it crash with writeVector? Is there a terra-only way of doing this?

thomarse-ef avatar Apr 08 '22 11:04 thomarse-ef

As you are not providing a filename, the lack of disk space would be for your tempdir() .
`

rhijmans avatar Sep 03 '22 19:09 rhijmans