rasterio icon indicating copy to clipboard operation
rasterio copied to clipboard

[WIP] Tiled merge

Open groutr opened this issue 4 years ago • 3 comments

An experiment to see if merge could be made more memory efficient using windowed read/write. Testing with the dataset in #2174 seems to produce good results when visually inspected (ie no lines of missing data). I've only manually tested at this point, but thought I would open a draft PR for early feedback.

Advantages:

  • Tries to keep only one or two tiles in memory at a time. Memory usage of tiles is tunable (see arrblocks). This probably allows arbitrary sized merges.
  • ~~Using windowed reads/writes seem to side-step the rounding issues in #2205~~
  • Each block is independent. Perhaps parallel merges are in the future?
  • Window reads can be aligned to the block sizes of the input datasets allowing for efficient reads. See the arrblocksxy function.
  • Merge loop seems to be shorter and more concise.

Disadvantages

  • Code is super new and therefore super untested.
  • Appears to trade speed for memory efficiency

groutr avatar Jun 25 '21 04:06 groutr

In-memory merge method (master) (15s): merge_old

Tiled merge (48s): merge_new_tiled

While the runtime of the tiled merge does seem to be significantly slower, a quick glance at a profile reveals that almost 80% of the time is spent doing IO, reading and writing from/to a compressed GTiff.

groutr avatar Jun 25 '21 17:06 groutr

TODO: handle resolution changes. The code as it stands now, only works if everything is at the same resolution.

groutr avatar Jun 25 '21 20:06 groutr

@sgillies Can I get your feedback? Is this the proper direction to expand merge?

There are two things happening here:

  1. Tiled merge functionality. Some preliminary testing shows that by threading the IO, a good portion of the IO costs can be mitigated. Not intended to replace in-memory merge, but complement it for out-of-core merges.
  2. Port the window bounds and intersection from gdal_merge. I found that gdal_merge uses a different rounding method for windows which appears to resolve #2205 (at least in the cases I have seen so far). Perhaps it might be useful to add this type of rounding to rasterio in the event that the current rounding fails.

This PR is a proof-of-concept for those two things. If anything is useful here, it should be added in its own PR.

groutr avatar Sep 14 '21 16:09 groutr