Rasters.jl
Rasters.jl copied to clipboard
optimize cellsize for lon-lat projections
Solves https://github.com/rafaqz/Rasters.jl/issues/764 and makes cellsize calculation much faster for lon-lat projections.
I'm considering getting rid of the ArchGDAL requirement for these projections as well.
Any ideas for further improvements @rafaqz @alex-s-gardner?
How do we get rid of ArchGDAL?
I was thinking we could just make a _crs2transform that just calls ArchGDAL.crs2transform if ArchGDAL is loaded and otherwise tells the user to load ArchGDAL. If the CRS is WGS84 then cellsize would never have to call _crs2transform
But how do we deal with Proj/Wkt, a lot of different text strings can be wgs84
But how do we deal with Proj/Wkt, a lot of different text strings can be wgs84
The way we already do it - I don't think this line of code needs ArchGDAL
https://github.com/rafaqz/Rasters.jl/blob/8d13c1887f90444d24e3c19ecbff42c8d5f0de68/ext/RastersArchGDALExt/cellsize.jl#L57
Yeah that's ArchGDAL doing that conversion ;)
Yeah that's ArchGDAL doing that conversion ;)
I was thrown off by the GeoFormatTypes wrapper, but makes sense. Let's just leave it as it is, then.
Yes ArchGDAL actually pirates Base.convert on GeoFormat, so the confusion is understandable. We could put it in a GeoFormatTypesArchGDAL extension now, but that wasn't possible at the time. We could also switch to using Proj.jl directly.
Can we take this opportunity to rename cellsize to cellarea and to return area in units of meters #747, also adding a units kwarg?
There is room for significant speed gains for small small grid spacing:
#For lat/lon aligned rasters with small grid spacing, this will be much faster and just as accurate:
# load packages
import Rasters: EPSG
using Rasters
using Rasters.Lookups
using DimensionalData
# define meters to lat/lon function
#"""
meters2lonlat_distance(distance_meters, latitude_degrees)
Returns the decimal degree distance along latitude and longitude lines given a distance in
meters and a latitude in decimal degrees.
# Example usage:
#```julia-repl
julia> distance_meters = 1000.0;
julia> latitude_degrees = 45.0;
julia> lat, lon = Altim.meters2lonlat_distance(distance_meters, latitude_degrees)
(0.008997741566866717, 0.012718328120254203)
#```
#"""
function meters2lonlat_distance(distance_meters, latitude_degrees)
# Radius of the Earth in meters
earth_radius = 6371000
# Calculate the angular distance in radians
angular_distance = distance_meters / earth_radius
# Calculate the longitude distance using the Haversine formula
longitude_distance = angular_distance * (180.0 / π) / cosd(latitude_degrees)
latitude_distance = distance_meters / 111139
return latitude_distance, longitude_distance
end
# build an example raster
dX = 0.1
dY = -0.1
lon = X(Projected(166.:dX:168.; sampling=Intervals(Start()), order=ForwardOrdered(), span=Regular(dX), crs=EPSG(4326)))
lat = Y(Projected(-78.0:dY:-80.; sampling=Intervals(Start()), order=ForwardOrdered(), span=Regular(dY ), crs=EPSG(4326)))
ras = Raster(rand(lon, lat))
# given a lat/lon raster with small grid spacing caclculate area
dlon = dims(ras, :X)
dlat = dims(ras, :Y)
lonlat_per_meter = meters2lonlat_distance.(Ref(1), dlat)
dist_lon = step(dlon) ./ getindex.(lonlat_per_meter, 1)
dist_lat = step(dlat) ./ getindex.(lonlat_per_meter, 2)
area = ones(dlon) * DimArray(dist_lon .* dist_lat, dlat)'
Can we take this opportunity to rename cellsize to cellarea and to return area in units of meters #747, also adding a units kwarg?
Let's wait for a complete Unitful extension to add the units keyword to everything all at once. (in a few months when my PhD is done 😅)
I'm happy to rename but we should @deprecate the current name so it still works for a while.
I'm happy to rename but we should
@deprecatethe current name so it still works for a while.
given the incorrectness of cellsize it might actually be better to break it
There is room for significant speed gains for small small grid spacing:
Have you compared to the implementation in this PR? For me it is giving similar speeds.
I can see the point of your implementation, but using the haversine formula also has downsides, since it doesn't account for the curvature of the earth. I know this isn't a big deal in most cases, but the error increases with the size of gridcells. So after (dis)aggregating a raster, the total area returned by cellarea would be different, which is not ideal.
Maybe we should implement it for other projections than lon-lat, but then we might need to do some more geometry.
Can we take this opportunity to rename cellsize to cellarea and to return area in units of meters #747, also adding a units kwarg?
Let's wait for a complete Unitful extension to add the
unitskeyword to everything all at once. (in a few months when my PhD is done 😅)
Maybe we should already start returning the result in metres though, to avoid another breaking change down the line. If we add a units keyword later then it would default to m^2.
Yeah result in metres is fine, just not the units kw
Have you compared to the implementation in this PR? For me it is giving similar speeds.
Your implementation knocks the socks off of my proposal (2x faster) and is more accurate so ignore my recommendation. Excited to see this progressing... fast, useful and intuitive... couldn't ask for anything more
Is this good to go?
I think so! Can you or @asinghvi17 just nod at the logic in the line of code that reprojects to mappedcrs? I don't think we need an option to turn that behaviour off, right? I guess users vouch for the accuracy of that transformation by providing a mappedcrs.
Now I think we're good to go!
Feel free to ignore the cosd/sind stuff if there's no time, I just discovered that they use an extended-precision deg2rad that might be useful.
Thanks for reviewing @asinghvi17! I didn't know about cosd and sind so that was really helpful.
I think this is really solid now! The only thing I could think of that we could still add is an option to disable the transformation and just return the cartesian area for planar projections. Should I add it?
I think this is really solid now! The only thing I could think of that we could still add is an option to disable the transformation and just return the cartesian area for planar projections. Should I add it?
I think this would be super useful. If the user understands their projection and just want map ptojected area in map units they could simply pass a kwarg like area_in_crs or similar
The only thing I could think of that we could still add is an option to disable the transformation and just return the cartesian area for planar projections.
I implemented this now and it works (even without ArchGDAL). I don't know if I like area_in_crs so much, though. terra uses transform, but that's also not super clear, is it?
Ideally we would say cellarea(Linear(), raster) or cellarea(Spherical(), raster) I guess, that will have to wait for a bunch of reactors in GeometryOps so that Rasters can depend on a "GeometryOpsCore" that exports these types.
Then if one wants as much precision as possible it's even possible to do cellarea(Geodesic(some_datum), raster)
Ideally we would say
cellarea(Linear(), raster)orcellarea(Spherical(), raster)
It seems like now would be the time to do this as this is a breaking change. We could make it easy and just remove kwarg in favor of Linear() and Spherical() types that could be internally defined... then we can expand in future to be more flexible and to eventually integrate GeometryOps... but at a later date?
It would take about 3 days to get out, but we could technically define a GeometryOpsCore.jl package today and get it registered, Rasters could then depend on that (it should have way less dependencies / load time).
I wouldn't want to define them internally because someone using this version of Rasters and a future version of GeometryOps will get quite a bit of incompatibility.
We have also been talking about singletons like that (in person here, sorry to be exclusive!).
But it makes sense to share them with GeometryOps.jl
Codecov Report
Attention: Patch coverage is 95.58824% with 3 lines in your changes missing coverage. Please review.
Project coverage is 82.61%. Comparing base (
a15ebb1) to head (ae793e6). Report is 56 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ext/RastersArchGDALExt/cellarea.jl | 96.29% | 2 Missing :warning: |
| src/extensions.jl | 92.85% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #768 +/- ##
==========================================
+ Coverage 82.32% 82.61% +0.28%
==========================================
Files 60 62 +2
Lines 4357 4566 +209
==========================================
+ Hits 3587 3772 +185
- Misses 770 794 +24
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Happy to report that this solves #764 in web mercator as well!
Full changelog:
- Switch
cellareaandreprojectto use Proj instead of ArchGDAL, increasing speed and decreasing the number of allocations. - Cellarea changes:
- Rename
cellsizetocellarea - Return output in meters by default (can be overridden by the radius in the manifold, so
Spherical(radius_of_earth_in_km)gives you are in km, for example. This could be made more user friendly in the future. - Allow "algorithm choice" using GeometryOpsCore Manifold types (planar / spherical)
- Use Eriksson's formula for spherical triangle area to compute the area of a grid cell using Cartesian (x, y, z) coordinates for spherical cellarea, increasing precision and providing correct cell area on at least 5mx5m grid cells (could be smaller, haven't tested)
- Switch to using
sindandcosdrather thandeg2rad, since they use extended-precisiondeg2radinternally - Use a direct method to get cell area for lat-long (geographic) projections, greatly simplifying the process
- Rename
- Reproject changes:
- Switch from transforming vectors of points out-of-place with ArchGDAL to transforming a single array of x or y values in-place with Proj's
proj_trans_generic. This drastically reduces allocations and memory usage, and improves runtime by 6x. - Use Proj's geographic / degree-unit checks to figure out if the CRS is long-lat. This currently checks explicitly for the "degree" unit name, but this could be expanded in the future.
- Switch from transforming vectors of points out-of-place with ArchGDAL to transforming a single array of x or y values in-place with Proj's
The test failures are unrelated and already fixed in https://github.com/rafaqz/Rasters.jl/pull/780. There is still room for improvement performance-wise but I think we should merge this now. We can make a different PR for further improvements.