inundation-mapping icon indicating copy to clipboard operation
inundation-mapping copied to clipboard

Use rasterio.windows.subdivide instead of block_windows

Open groutr opened this issue 1 year ago • 2 comments

After meeting with @RobHanna-NOAA, I noticed several uses of .block_windows() in the code base. The way this function is used in the code base leads me to believe that this function is misunderstood.

GeoTiff files can be tiled (ie how data is chunked and stored in the tif file format itself). The .block_windows() function returns windows over these tiles. After my discussion with Rob, it was discovered that the tif files being processed are not tiled, meaning that block_windows is returning a single window over the whole dataset, thus leading to very high memory usage.

I suggest using rasterio.windows.subdivide instead to generate arbitrarily sized subwindows covering a dataset. https://rasterio.readthedocs.io/en/stable/api/rasterio.windows.html#rasterio.windows.subdivide

groutr avatar Nov 26 '24 19:11 groutr

This could have potentially some significant performance impacts

RobHanna-NOAA avatar Dec 11 '24 20:12 RobHanna-NOAA

Update: I have been looking randomly over code parts and it looks like most are tiled after all including block size overrides. But we should double check them all. I think there are some places that it doesn't manage this.

RobHanna-NOAA avatar Dec 13 '24 18:12 RobHanna-NOAA