rasterio icon indicating copy to clipboard operation
rasterio copied to clipboard

Windowed read for a large single-block data

Open bellini666 opened this issue 6 years ago • 2 comments

I have a large data (2gb geotiff image) that I need to process and write to a new image, but the computer I'm going to process it only has 512mb of ram.

Looking at the windowed read/write from the documentation I wrote a code that reads and writes using a window of 50cols x 50rows. I slide that window through all the image.

I'm testing the code in my development machine, which has 8gb of ram. I notied that, when doing the processing, the whole image got loaded in the memory, even though I was using windowed read/write.

One thing that I noticed is that the image has only one block. More specifically, if I call src.block_windows(1) it will give me only one window, and that window takes the whole image.

So, I don't know if this is an issue in rasterio or is something that I'm doing wrong

bellini666 avatar Dec 30 '17 17:12 bellini666

@hackedbellini yes, if your GeoTIFF isn't tiled or striped, the entire file will be read to access even a small region of data. I haven't gotten around to writing this down in the Rasterio documentation because it is noted in http://www.gdal.org/gdal_datamodel.html (search for "A block size" in the Raster Band section) and in http://www.gdal.org/frmt_gtiff.html. In neither of them is the situation explained as clearly as it should be, for this is a very important point.

I'll label this as a documentation bug to be fixed at 1.0.

sgillies avatar Dec 30 '17 18:12 sgillies

@sgillies oh I see. I'm new to this so I'm still learning the gotchas :)

Btw, one thing that I noticed that might be useful to document. Even tough the whole file was loaded into memory (because it was a single block file), iterating by a window of 50x50 (or even 100x100) reduced the amount of memory used by a large amount. If I were to read the whole block, I would end up duplicating the memory and at some time even triplicating because of the data processing that I'm doing.

I'm iterating using this function instead of block_windows. Maybe it would can useful to rasterio or someone looking at this issue in the future?

def iter_window(col_off, row_off, width, height, step=100):
    h = step
    for i in range(int(row_off), int(row_off + height + step), step):
        if i + h > row_off + height:
            h = height - i
            if h <= 0:
                break
        w = step
        for j in range(int(col_off), int(col_off + width + step), step):
            if j + w > col_off + width:
                w = width - j
                if w <= 0:
                    break
            yield rasterio.windows.Window(j, i, w, h)

bellini666 avatar Jan 01 '18 13:01 bellini666