inundation-mapping
inundation-mapping copied to clipboard
Use rasterio.windows.subdivide instead of block_windows
After meeting with @RobHanna-NOAA, I noticed several uses of .block_windows() in the code base. The way this function is used in the code base leads me to believe that this function is misunderstood.
GeoTiff files can be tiled (ie how data is chunked and stored in the tif file format itself). The .block_windows() function returns windows over these tiles. After my discussion with Rob, it was discovered that the tif files being processed are not tiled, meaning that block_windows is returning a single window over the whole dataset, thus leading to very high memory usage.
I suggest using rasterio.windows.subdivide instead to generate arbitrarily sized subwindows covering a dataset.
https://rasterio.readthedocs.io/en/stable/api/rasterio.windows.html#rasterio.windows.subdivide
This could have potentially some significant performance impacts
Update: I have been looking randomly over code parts and it looks like most are tiled after all including block size overrides. But we should double check them all. I think there are some places that it doesn't manage this.