rasterio
rasterio copied to clipboard
Windowed read for a large single-block data
I have a large data (2gb geotiff image) that I need to process and write to a new image, but the computer I'm going to process it only has 512mb of ram.
Looking at the windowed read/write from the documentation I wrote a code that reads and writes using a window of 50cols x 50rows. I slide that window through all the image.
I'm testing the code in my development machine, which has 8gb of ram. I notied that, when doing the processing, the whole image got loaded in the memory, even though I was using windowed read/write.
One thing that I noticed is that the image has only one block. More specifically, if I call src.block_windows(1)
it will give me only one window, and that window takes the whole image.
So, I don't know if this is an issue in rasterio or is something that I'm doing wrong
@hackedbellini yes, if your GeoTIFF isn't tiled or striped, the entire file will be read to access even a small region of data. I haven't gotten around to writing this down in the Rasterio documentation because it is noted in http://www.gdal.org/gdal_datamodel.html (search for "A block size" in the Raster Band section) and in http://www.gdal.org/frmt_gtiff.html. In neither of them is the situation explained as clearly as it should be, for this is a very important point.
I'll label this as a documentation bug to be fixed at 1.0.
@sgillies oh I see. I'm new to this so I'm still learning the gotchas :)
Btw, one thing that I noticed that might be useful to document. Even tough the whole file was loaded into memory (because it was a single block file), iterating by a window of 50x50 (or even 100x100) reduced the amount of memory used by a large amount. If I were to read the whole block, I would end up duplicating the memory and at some time even triplicating because of the data processing that I'm doing.
I'm iterating using this function instead of block_windows
. Maybe it would can useful to rasterio or someone looking at this issue in the future?
def iter_window(col_off, row_off, width, height, step=100):
h = step
for i in range(int(row_off), int(row_off + height + step), step):
if i + h > row_off + height:
h = height - i
if h <= 0:
break
w = step
for j in range(int(col_off), int(col_off + width + step), step):
if j + w > col_off + width:
w = width - j
if w <= 0:
break
yield rasterio.windows.Window(j, i, w, h)