rasterio
rasterio copied to clipboard
[WIP] Tiled merge
An experiment to see if merge could be made more memory efficient using windowed read/write. Testing with the dataset in #2174 seems to produce good results when visually inspected (ie no lines of missing data). I've only manually tested at this point, but thought I would open a draft PR for early feedback.
Advantages:
- Tries to keep only one or two tiles in memory at a time. Memory usage of tiles is tunable (see
arrblocks). This probably allows arbitrary sized merges. - ~~Using windowed reads/writes seem to side-step the rounding issues in #2205~~
- Each block is independent. Perhaps parallel merges are in the future?
- Window reads can be aligned to the block sizes of the input datasets allowing for efficient reads. See the
arrblocksxyfunction. - Merge loop seems to be shorter and more concise.
Disadvantages
- Code is super new and therefore super untested.
- Appears to trade speed for memory efficiency
In-memory merge method (master) (15s):

Tiled merge (48s):

While the runtime of the tiled merge does seem to be significantly slower, a quick glance at a profile reveals that almost 80% of the time is spent doing IO, reading and writing from/to a compressed GTiff.
TODO: handle resolution changes. The code as it stands now, only works if everything is at the same resolution.
@sgillies Can I get your feedback? Is this the proper direction to expand merge?
There are two things happening here:
- Tiled merge functionality. Some preliminary testing shows that by threading the IO, a good portion of the IO costs can be mitigated. Not intended to replace in-memory merge, but complement it for out-of-core merges.
- Port the window bounds and intersection from gdal_merge. I found that gdal_merge uses a different rounding method for windows which appears to resolve #2205 (at least in the cases I have seen so far). Perhaps it might be useful to add this type of rounding to rasterio in the event that the current rounding fails.
This PR is a proof-of-concept for those two things. If anything is useful here, it should be added in its own PR.