Rasters.jl icon indicating copy to clipboard operation
Rasters.jl copied to clipboard

zonal is very slow with DiskArray backend

Open alex-s-gardner opened this issue 10 months ago • 3 comments

I have a vrt for a large out of memory file that I read in (lazily) as a Raster

ras = Raster(hrsl_vrt; lazy=true);

doing a zonal sum on a simple geometry takes 30s!!

Rasters.zonal(sum, ras, of=geom[1]; threaded=false)

If I instead first read the needed supbset of the vrt into memory (takes 2.5s)

ras2 = read(Rasters.crop(ras; to=geom[1]))

the time for the same zonal sum operation reduces by two orders of magnitude (0.3s)

Rasters.zonal(sum, ras2, of=geom[1]; threaded=false)

alex-s-gardner avatar Jan 17 '25 21:01 alex-s-gardner

Have you tried enclosing your loop / operation in open(ras) do ras? We need to make zonal open rasters on call, see https://github.com/rafaqz/Rasters.jl/pull/832. I just need to add a test it seems.

asinghvi17 avatar Jan 18 '25 01:01 asinghvi17

How much larger is your vrt than the extent of your geometry? What is the speed if you do only the crop without reading the data into memory?

Most likely we hit a path in the zonal code which does a single getindex on every element instead of doing a batchgetindex on the whole chunk. I am wondering, whether we can change the code in zonal so that it works or whether we would need to special case the zonal of a DiskArray backed Raster in an extension.

felixcremer avatar Jan 18 '25 10:01 felixcremer

zonal should be doing a broadcast to read. But we don't test performance on DiskArrays. We just need to go over all the methods and disallow scalar indexing and test them better for diskarrays

rafaqz avatar Jan 18 '25 10:01 rafaqz