terra
terra copied to clipboard
Performance Comparison: 'extract' Function in 'raster' vs. 'terra' for Large Dataframes and Rasters
When working with large rasters and large dataframes, the extract
function from the raster
library significantly outperforms the terra
library. In the example below, raster
is twice as fast as terra
. As the raster size increases, this performance gap widens considerably. For the example below, with a raster on disk around 12 GB and a dataframe containing approximately 50 million locations, raster
demonstrates notably superior speed. For a raster of 100 GB, this difference in performance could exceed ten fold.
# Load necessary libraries
library(terra)
library(raster)
library(microbenchmark)
# Function to generate a large artificial raster
create_large_raster <- function(filename, nrow, ncol) {
# Create a raster with specified dimensions
r <- rast(nrows = nrow, ncols = ncol, crs = "EPSG:4326")
# Fill raster with random values
values(r) <- runif(ncell(r))
# Write raster to file
writeRaster(r, filename, overwrite = TRUE)
}
# Function to generate a large artificial dataframe
create_large_dataframe <- function(n) {
# Create a dataframe with random latitude and longitude
lat <- runif(n, min = -90, max = 90)
lon <- runif(n, min = -180, max = 180)
df <- data.frame(lon = lon, lat = lat)
return(df)
}
# Parameters for the large raster and dataframe
raster_filename <- "large_raster.tif"
raster_nrow <- 50000 # Example dimensions for the raster
raster_ncol <- 50000
num_locations <- 50000000 # 50 million locations
# Create the artificial raster and dataframe
create_large_raster(raster_filename, raster_nrow, raster_ncol)
large_dataframe <- create_large_dataframe(num_locations)
# Load raster and perform extraction using both libraries
extract_terra <- function(raster_file, loc_table) {
r <- rast(raster_file)
extracted_values <- extract(r, loc_table)
return(extracted_values)
}
extract_raster <- function(raster_file, loc_table) {
r <- raster(raster_file)
extracted_values <- extract(r, loc_table)
return(extracted_values)
}
# Benchmarking the performance
results <- microbenchmark(
terra = extract_terra(raster_filename, large_dataframe),
raster = extract_raster(raster_filename, large_dataframe),
times = 2 # Reduce times for demonstration purposes
)
# Print benchmarking results
print(results)