Rasters.jl
Rasters.jl copied to clipboard
zonal is very slow with DiskArray backend
I have a vrt for a large out of memory file that I read in (lazily) as a Raster
ras = Raster(hrsl_vrt; lazy=true);
doing a zonal sum on a simple geometry takes 30s!!
Rasters.zonal(sum, ras, of=geom[1]; threaded=false)
If I instead first read the needed supbset of the vrt into memory (takes 2.5s)
ras2 = read(Rasters.crop(ras; to=geom[1]))
the time for the same zonal sum operation reduces by two orders of magnitude (0.3s)
Rasters.zonal(sum, ras2, of=geom[1]; threaded=false)
Have you tried enclosing your loop / operation in open(ras) do ras? We need to make zonal open rasters on call, see https://github.com/rafaqz/Rasters.jl/pull/832. I just need to add a test it seems.
How much larger is your vrt than the extent of your geometry? What is the speed if you do only the crop without reading the data into memory?
Most likely we hit a path in the zonal code which does a single getindex on every element instead of doing a batchgetindex on the whole chunk. I am wondering, whether we can change the code in zonal so that it works or whether we would need to special case the zonal of a DiskArray backed Raster in an extension.
zonal should be doing a broadcast to read. But we don't test performance on DiskArrays. We just need to go over all the methods and disallow scalar indexing and test them better for diskarrays