Omniscape.jl icon indicating copy to clipboard operation
Omniscape.jl copied to clipboard

Offer a "low memory mode" and multiprocessing of array chunks

Open vlandau opened this issue 5 years ago • 11 comments
trafficstars

Often grids are so large that allocating arrays requires tons of memory. Each moving window solve is independent, so a low-memory mode may be relatively simple.


Two options come to mind:

1. Split the landscape into chunks, and solve the chunks in serial

The landscape could be split into adjacent, non-overlapping rectangular or square chunks. Each chunk (and corresponding inputs) would need to be buffered by radius (omniscape argument) so that the pixels at the edge of the unbuffered chunk aren't missing any data in their moving windows. Then, results for each chunk could be stored and mosaicked, taking the max value. If done correctly, his would result in the same exact map as would be produced in Omniscape v0.4.2.

2. Lazy reading and writing for each moving window solve

A given moving window solve only needs the data directly pertinent to it, so there is no need to store the entire raster inputs in-memory. Just load what is needed for each moving window (GeoData.jl has functionality to do this), solve it, lazily-write to an output raster then remove/garbage collect data and move on to the next solve. Thread safety could be a little tricky. Each thread could write to its own output component, then output components could be summed, but this would need to be done lazily/by chunk as well, otherwise you'd end up loading all the components in memory, which would take as much memory as the current method of Omniscape. Another consideration is computing the coordinates and source strengths of target pixels (the ones on which the moving windows are centered). The biggest memory hog in Omniscape is the allocation of the array(s) that store(s) current values. They have the same number of rows and columns asn the resistance surface, but have a 3rd dimension equal to the number of parallel workers. If computing normalized current flow in addition to cumulative current, there are two of them! Because of this, even if the source strength layer needs to be loaded in full into memory, memory savings would still be substantial.


I tend to like option 2, it's the slickest, and if done in a thread safe way, I think it would be more straightforward to implement in a way such that errors from edge cases would be less likely, but IO overhead might be significant -- not sure.

vlandau avatar Oct 02 '20 18:10 vlandau

This would be a huge improvement for the scalability of Omniscape. I feel like solution 1 would be infinitely scalable, which isn't the case for solution 2, no?

glaroc avatar Feb 16 '21 16:02 glaroc

Yeah, on a second read after leaving this for a while, I'm preferring option 1 above... I think it would be faster and also easy to test for correctness.

vlandau avatar Feb 17 '21 16:02 vlandau

But thinking more about it, option 2 if implemented well could be more suited for HPC where each thread is using very little memory. Ultimately, this might be the best solution.

glaroc avatar Feb 17 '21 16:02 glaroc

Yeah, I think option 1 has the opportunity to be much faster though (lazy read/write plus associated lock/unlocks in the code for thread safety for that many operations could involve massive overhead).

Also, if in option 2 there were N output GeoTiffs that were written to separately for thread safety, where N is the number of threads, then each of those N tiles would be the full size of the landscape, which could introduce additional memory bottlenecks when adding them together.

vlandau avatar Feb 17 '21 17:02 vlandau

I tested option 1 by splitting the input resistance map in Bash+GDAL first into overlapping tiles, then running Omniscape on each tile, and finally remosaic the tiles in Python using rasterio.merge(method="max). It works really well and you don't see any stitch marks in the final output. I looked at the Omniscape code and I see it would require some significant refactoring to do this within Omniscape. It also seems quite difficult to do the final merge in GDAL only using the max pixel value. I don't know if there is an equivalent merge function in Julia.

glaroc avatar Mar 11 '21 18:03 glaroc

Awesome! Glad it worked well. I think there might be a way to mosiac the TIFFs using GDAL.jl or ArchGDAL.jl? But actually, I use gdal_merge.py from the command line when merging rasters, and maybe that's not available in Julia...

We could possibly do it with GeoData.jl. There may be a way to create an empty GDALarray, then write to it lazily using GeoData.jl. The key is lazy writing, to ensure the most efficient use of RAM.

cc @rafaqz, who might have some ideas/thoughts. I think GeoData.jl (or maybe ArchGDAL?) might have what we need.

I see the workflow as having these steps:

  1. Load in the raster inputs as GeoData.GDALarrays so they're not fully loaded into memory
  2. Identify tile extents (such that they overlap enough, and based on user-provided params), with each "extent" being an Array{Number, 1} of length 4 (North, South, East, West coords), so we'd have an Array{Array{Number, 1}, 1}
  3. Loop through the Array{Array{Number, 1}, 1} and load that extent of each input raster into memory, and for each:
    • Run Omniscape
    • Write that Omniscape output to disk
    • Clear outputs from memory/garbage collect
  4. Create an empty GDALarray and iteratively add each Omniscape output to it lazily

I'd like to start working on these components when I have time -- It might be a bit until I have a chunk of time to do it, but I think we're starting to get a somewhat clear plan of action at this point.

vlandau avatar Mar 11 '21 19:03 vlandau

Just a note to mention that gdal_merge.py doesn't allow to take the maximum value of overlapping pixels. If you don't do this, there will be some serious stitch marks.

glaroc avatar Mar 11 '21 19:03 glaroc

Also, the tiling of the landscape could end up giving us a solid foundation from which to build out functionality to enable hierarchical parallelism.

vlandau avatar Mar 11 '21 19:03 vlandau

GeoData.jl has the open method the you use with a do block to write to the open ArchGDAL.jl/DiskArrays.jl dataset underneath. I havent really tested it for writing, more just for reading larger-than-memory files. But this should be doable and easy in Julia, I havent petsonally been pushed to ensure that because my datasets are only ever that big along the time dimension. I would be happy to help out - I think this being easy should be a goal of the julia geo ecosystem.

rafaqz avatar Mar 12 '21 00:03 rafaqz

Thanks @rafaqz! I'll keep you in the loop when I find time to work on this, and may solicit some feedback as I go 🙂

vlandau avatar Mar 12 '21 01:03 vlandau

Version 1 will be the way I go, as it has several benefits, including making multiprocessing easy to do, which will enable hierarchical parallel processing (e.g. see #106)

vlandau avatar Sep 08 '21 20:09 vlandau