georasters icon indicating copy to clipboard operation
georasters copied to clipboard

Improved performace and correctness for raster union

Open marthinwurer opened this issue 5 years ago • 4 comments

Moved some stuff around and made some performance improvements to the merge functionality. There were a lot of extra array copies and instances where larger data types than needed were used.

The main improvements were:

  • selecting the correct numpy dtype for the arrays that were created
  • selecting a default no data value if there was none
  • creating a pre-filled array with the no data value instead of multiplying it with an array of ones
  • making the mask start out as an array of bools.

Testing done: Just on my own code, merging 24 rasters into a 43201x86401 array. It just barely works now instead of crashing because it was not able to allocate the arrays.

marthinwurer avatar Oct 10 '19 16:10 marthinwurer

CI might be broken, it's erroring because it can't find libspatialindex_c, which is something that I noted in #27.

marthinwurer avatar Oct 10 '19 17:10 marthinwurer

Looks so much nicer. Thanks!

Would be good to test and make sure it still works "as expected". It is on the to-do list in #14. If you feel like contributing some tests that's be really great!

I'll try to see where that libspatialindex_c is coming from. I do not think the package uses it explicitly. It may be from one of the other GIS packages, like geopandas or pysal.

ozak avatar Oct 10 '19 21:10 ozak

It looks like rtree from geopandas is where it's coming from. I can mock up some tests with some fake data and see if the outputs are the same. I'm pretty sure types are going to be different because they weren't handled before, but I could do something like a mse threshold between outputs to make sure that nothing wonky is happening.

marthinwurer avatar Oct 10 '19 21:10 marthinwurer

It seems in the CI with py3.5 is failing due to some matplotlib incompatibility with scikits-image. On py36 it is rtree which is needed by geopandas. Using conda install -c conda-forge georasters has been working so far without any of these issues, since all the versions of the packages are coherent. I guess pip is having problems due to the compatibility of packages.

ozak avatar Oct 10 '19 21:10 ozak