Rasters.jl icon indicating copy to clipboard operation
Rasters.jl copied to clipboard

optimize cellsize for lon-lat projections

Open tiemvanderdeure opened this issue 1 year ago • 28 comments

Solves https://github.com/rafaqz/Rasters.jl/issues/764 and makes cellsize calculation much faster for lon-lat projections.

I'm considering getting rid of the ArchGDAL requirement for these projections as well.

Any ideas for further improvements @rafaqz @alex-s-gardner?

tiemvanderdeure avatar Sep 27 '24 09:09 tiemvanderdeure

How do we get rid of ArchGDAL?

rafaqz avatar Sep 27 '24 09:09 rafaqz

I was thinking we could just make a _crs2transform that just calls ArchGDAL.crs2transform if ArchGDAL is loaded and otherwise tells the user to load ArchGDAL. If the CRS is WGS84 then cellsize would never have to call _crs2transform

tiemvanderdeure avatar Sep 27 '24 10:09 tiemvanderdeure

But how do we deal with Proj/Wkt, a lot of different text strings can be wgs84

rafaqz avatar Sep 27 '24 10:09 rafaqz

But how do we deal with Proj/Wkt, a lot of different text strings can be wgs84

The way we already do it - I don't think this line of code needs ArchGDAL

https://github.com/rafaqz/Rasters.jl/blob/8d13c1887f90444d24e3c19ecbff42c8d5f0de68/ext/RastersArchGDALExt/cellsize.jl#L57

tiemvanderdeure avatar Sep 27 '24 10:09 tiemvanderdeure

Yeah that's ArchGDAL doing that conversion ;)

rafaqz avatar Sep 27 '24 11:09 rafaqz

Yeah that's ArchGDAL doing that conversion ;)

I was thrown off by the GeoFormatTypes wrapper, but makes sense. Let's just leave it as it is, then.

tiemvanderdeure avatar Sep 27 '24 12:09 tiemvanderdeure

Yes ArchGDAL actually pirates Base.convert on GeoFormat, so the confusion is understandable. We could put it in a GeoFormatTypesArchGDAL extension now, but that wasn't possible at the time. We could also switch to using Proj.jl directly.

rafaqz avatar Sep 27 '24 12:09 rafaqz

Can we take this opportunity to rename cellsize to cellarea and to return area in units of meters #747, also adding a units kwarg?

alex-s-gardner avatar Sep 27 '24 16:09 alex-s-gardner

There is room for significant speed gains for small small grid spacing:

#For lat/lon aligned rasters with small grid spacing, this will be much faster and just as accurate:

# load packages
import Rasters: EPSG
using Rasters
using Rasters.Lookups
using DimensionalData


# define meters to lat/lon function
#"""
    meters2lonlat_distance(distance_meters, latitude_degrees)

Returns the decimal degree distance along latitude and longitude lines given a distance in 
meters and a latitude in decimal degrees.

# Example usage:
#```julia-repl
julia> distance_meters = 1000.0;  
julia> latitude_degrees = 45.0;  

julia> lat, lon = Altim.meters2lonlat_distance(distance_meters, latitude_degrees)
(0.008997741566866717, 0.012718328120254203)
#```
#"""
function meters2lonlat_distance(distance_meters, latitude_degrees)
    # Radius of the Earth in meters
    earth_radius = 6371000

    # Calculate the angular distance in radians
    angular_distance = distance_meters / earth_radius

    # Calculate the longitude distance using the Haversine formula
    longitude_distance = angular_distance * (180.0 / π) / cosd(latitude_degrees)

    latitude_distance = distance_meters / 111139

    return latitude_distance, longitude_distance
end

# build an example raster
dX = 0.1
dY = -0.1
lon = X(Projected(166.:dX:168.; sampling=Intervals(Start()), order=ForwardOrdered(), span=Regular(dX), crs=EPSG(4326)))
lat = Y(Projected(-78.0:dY:-80.; sampling=Intervals(Start()), order=ForwardOrdered(), span=Regular(dY ), crs=EPSG(4326)))
ras = Raster(rand(lon, lat))

# given a lat/lon raster with small grid spacing caclculate area
dlon = dims(ras, :X)
dlat = dims(ras, :Y)
lonlat_per_meter = meters2lonlat_distance.(Ref(1), dlat)

dist_lon = step(dlon) ./ getindex.(lonlat_per_meter, 1)
dist_lat = step(dlat) ./ getindex.(lonlat_per_meter, 2)
area = ones(dlon) * DimArray(dist_lon .* dist_lat, dlat)'

alex-s-gardner avatar Sep 27 '24 16:09 alex-s-gardner

Can we take this opportunity to rename cellsize to cellarea and to return area in units of meters #747, also adding a units kwarg?

Let's wait for a complete Unitful extension to add the units keyword to everything all at once. (in a few months when my PhD is done 😅)

I'm happy to rename but we should @deprecate the current name so it still works for a while.

rafaqz avatar Sep 27 '24 17:09 rafaqz

I'm happy to rename but we should @deprecate the current name so it still works for a while.

given the incorrectness of cellsize it might actually be better to break it

alex-s-gardner avatar Sep 27 '24 18:09 alex-s-gardner

There is room for significant speed gains for small small grid spacing:

Have you compared to the implementation in this PR? For me it is giving similar speeds.

I can see the point of your implementation, but using the haversine formula also has downsides, since it doesn't account for the curvature of the earth. I know this isn't a big deal in most cases, but the error increases with the size of gridcells. So after (dis)aggregating a raster, the total area returned by cellarea would be different, which is not ideal.

Maybe we should implement it for other projections than lon-lat, but then we might need to do some more geometry.

tiemvanderdeure avatar Sep 27 '24 19:09 tiemvanderdeure

Can we take this opportunity to rename cellsize to cellarea and to return area in units of meters #747, also adding a units kwarg?

Let's wait for a complete Unitful extension to add the units keyword to everything all at once. (in a few months when my PhD is done 😅)

Maybe we should already start returning the result in metres though, to avoid another breaking change down the line. If we add a units keyword later then it would default to m^2.

tiemvanderdeure avatar Sep 28 '24 09:09 tiemvanderdeure

Yeah result in metres is fine, just not the units kw

rafaqz avatar Sep 28 '24 10:09 rafaqz

Have you compared to the implementation in this PR? For me it is giving similar speeds.

Your implementation knocks the socks off of my proposal (2x faster) and is more accurate so ignore my recommendation. Excited to see this progressing... fast, useful and intuitive... couldn't ask for anything more

alex-s-gardner avatar Sep 28 '24 23:09 alex-s-gardner

Is this good to go?

rafaqz avatar Oct 01 '24 20:10 rafaqz

I think so! Can you or @asinghvi17 just nod at the logic in the line of code that reprojects to mappedcrs? I don't think we need an option to turn that behaviour off, right? I guess users vouch for the accuracy of that transformation by providing a mappedcrs.

tiemvanderdeure avatar Oct 02 '24 07:10 tiemvanderdeure

Now I think we're good to go!

tiemvanderdeure avatar Oct 02 '24 14:10 tiemvanderdeure

Feel free to ignore the cosd/sind stuff if there's no time, I just discovered that they use an extended-precision deg2rad that might be useful.

asinghvi17 avatar Oct 02 '24 14:10 asinghvi17

Thanks for reviewing @asinghvi17! I didn't know about cosd and sind so that was really helpful.

I think this is really solid now! The only thing I could think of that we could still add is an option to disable the transformation and just return the cartesian area for planar projections. Should I add it?

tiemvanderdeure avatar Oct 02 '24 15:10 tiemvanderdeure

I think this is really solid now! The only thing I could think of that we could still add is an option to disable the transformation and just return the cartesian area for planar projections. Should I add it?

I think this would be super useful. If the user understands their projection and just want map ptojected area in map units they could simply pass a kwarg like area_in_crs or similar

alex-s-gardner avatar Oct 02 '24 15:10 alex-s-gardner

The only thing I could think of that we could still add is an option to disable the transformation and just return the cartesian area for planar projections.

I implemented this now and it works (even without ArchGDAL). I don't know if I like area_in_crs so much, though. terra uses transform, but that's also not super clear, is it?

tiemvanderdeure avatar Oct 02 '24 17:10 tiemvanderdeure

Ideally we would say cellarea(Linear(), raster) or cellarea(Spherical(), raster) I guess, that will have to wait for a bunch of reactors in GeometryOps so that Rasters can depend on a "GeometryOpsCore" that exports these types.

Then if one wants as much precision as possible it's even possible to do cellarea(Geodesic(some_datum), raster)

asinghvi17 avatar Oct 02 '24 17:10 asinghvi17

Ideally we would say cellarea(Linear(), raster) or cellarea(Spherical(), raster)

It seems like now would be the time to do this as this is a breaking change. We could make it easy and just remove kwarg in favor of Linear() and Spherical() types that could be internally defined... then we can expand in future to be more flexible and to eventually integrate GeometryOps... but at a later date?

alex-s-gardner avatar Oct 02 '24 17:10 alex-s-gardner

It would take about 3 days to get out, but we could technically define a GeometryOpsCore.jl package today and get it registered, Rasters could then depend on that (it should have way less dependencies / load time).

I wouldn't want to define them internally because someone using this version of Rasters and a future version of GeometryOps will get quite a bit of incompatibility.

asinghvi17 avatar Oct 02 '24 17:10 asinghvi17

We have also been talking about singletons like that (in person here, sorry to be exclusive!).

But it makes sense to share them with GeometryOps.jl

rafaqz avatar Oct 02 '24 19:10 rafaqz

Codecov Report

Attention: Patch coverage is 95.58824% with 3 lines in your changes missing coverage. Please review.

Project coverage is 82.61%. Comparing base (a15ebb1) to head (ae793e6). Report is 56 commits behind head on main.

Files with missing lines Patch % Lines
ext/RastersArchGDALExt/cellarea.jl 96.29% 2 Missing :warning:
src/extensions.jl 92.85% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #768      +/-   ##
==========================================
+ Coverage   82.32%   82.61%   +0.28%     
==========================================
  Files          60       62       +2     
  Lines        4357     4566     +209     
==========================================
+ Hits         3587     3772     +185     
- Misses        770      794      +24     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 04 '24 14:10 codecov[bot]

Happy to report that this solves #764 in web mercator as well! hello

asinghvi17 avatar Oct 05 '24 16:10 asinghvi17

Full changelog:

  • Switch cellarea and reproject to use Proj instead of ArchGDAL, increasing speed and decreasing the number of allocations.
  • Cellarea changes:
    • Rename cellsize to cellarea
    • Return output in meters by default (can be overridden by the radius in the manifold, so Spherical(radius_of_earth_in_km) gives you are in km, for example. This could be made more user friendly in the future.
    • Allow "algorithm choice" using GeometryOpsCore Manifold types (planar / spherical)
    • Use Eriksson's formula for spherical triangle area to compute the area of a grid cell using Cartesian (x, y, z) coordinates for spherical cellarea, increasing precision and providing correct cell area on at least 5mx5m grid cells (could be smaller, haven't tested)
    • Switch to using sind and cosd rather than deg2rad, since they use extended-precision deg2rad internally
    • Use a direct method to get cell area for lat-long (geographic) projections, greatly simplifying the process
  • Reproject changes:
    • Switch from transforming vectors of points out-of-place with ArchGDAL to transforming a single array of x or y values in-place with Proj's proj_trans_generic. This drastically reduces allocations and memory usage, and improves runtime by 6x.
    • Use Proj's geographic / degree-unit checks to figure out if the CRS is long-lat. This currently checks explicitly for the "degree" unit name, but this could be expanded in the future.

asinghvi17 avatar Oct 08 '24 20:10 asinghvi17

The test failures are unrelated and already fixed in https://github.com/rafaqz/Rasters.jl/pull/780. There is still room for improvement performance-wise but I think we should merge this now. We can make a different PR for further improvements.

tiemvanderdeure avatar Oct 09 '24 10:10 tiemvanderdeure