Rasters.jl
Rasters.jl copied to clipboard
cellsize returns erroneous results for grids with fine spacing [no reprojeciton]
This makes sense as gridsize is increasing linearly with latitude
using Rasters
using Rasters.Lookups
import GeoFormatTypes as GFT
using Plots
dX = 0.1
dY = -0.1
lon = X(Projected(166.:dX:168.; sampling=Intervals(Start()), order=ForwardOrdered(), span=Regular(dX), crs=EPSG(4326)))
lat = Y(Projected(-78.0:dY:-80.; sampling=Intervals(Start()), order=ForwardOrdered(), span=Regular(dY ), crs=EPSG(4326)))
ras = Raster(rand(lon, lat))
cell_area = Rasters.cellsize(ras)
plot(cell_area[1, :])
if we decrease the step size in X we start to see some jutter
dX = 0.0006
dY = -0.1
lon = X(Projected(166.0:dX:168.0; sampling=Intervals(Start()), order=ForwardOrdered(), span=Regular(dX), crs=EPSG(4326)))
lat = Y(Projected(-78.0:dY:-80.0; sampling=Intervals(Start()), order=ForwardOrdered(), span=Regular(dY), crs=EPSG(4326)))
ras = Raster(rand(lon, lat))
cell_area = Rasters.cellsize(ras)
plot(cell_area[1, :])
and things really fall apart when we decrease step size in X and Y
dX = 0.0006
dY = -0.0003
lon = X(Projected(166.0:dX:167.0; sampling=Intervals(Start()), order=ForwardOrdered(), span=Regular(dX), crs=EPSG(4326)))
lat = Y(Projected(-78.0:dY:-79.0; sampling=Intervals(Start()), order=ForwardOrdered(), span=Regular(dY), crs=EPSG(4326)))
ras = Raster(rand(lon, lat))
cell_area = Rasters.cellsize(ras)
plot(cell_area[1, :])
@alex-s-gardner can you confirm this is on v0.11.8 of Rasters.jl?
Yes, it is v0.11.8
~I wonder if we need to densify and subsample points on the grid? I would expect that to have an effect as we increase dx and dy though, not as we decrease...~
~https://github.com/rafaqz/Rasters.jl/blob/8d13c1887f90444d24e3c19ecbff42c8d5f0de68/ext/RastersArchGDALExt/cellsize.jl#L71-L77~
nevermind, I just saw that the crs there is in long-lat. It might be something with the circular area calculation then?
Seems like a precision issue to me... is Float64 geometry being downgraded anywhere?
No, but it is being converted to radians and then multiplied by pi
Any ideas @tiemvanderdeure ?
Thanks for pointing this out. I don't know exactly but it seems like something is being rounded somewhere?
Anyway, it's been on my list for a little while to make a specialised implementation for lon-lat grids that calculates the size of each latitudinal band in a grid and then divides it up between the cells. That should be much faster and avoids this type of error.
The current implementation just treats each cell as a linearring, which makes it super generic but slow and (apparently) introduces some precision issues.
specialised implementation for lon-lat grids that calculates the size of each latitudinal band in a grid and then divides it up between the cells
Could that also apply to any CRS that can map to lat/long via mapcrs?
Yes we should be able to do the conversion for any grid aligned crs
specialised implementation for lon-lat grids that calculates the size of each latitudinal band in a grid and then divides it up between the cells
Could that also apply to any CRS that can map to lat/long via
mapcrs?
It depends on this line, which (I think) should return true for any lat/long crs: https://github.com/rafaqz/Rasters.jl/blob/8d13c1887f90444d24e3c19ecbff42c8d5f0de68/ext/RastersArchGDALExt/cellsize.jl#L57
But there may be a better way to test this?
For mappedcrs we test if changing the projection of two points with the same X and different Y gets points with the same X. But then the user has specified the conversion explicitly and we are just checking it's not really wrong, so its not the same situation.
Okay, I've never used mappedcrs and don't think I've ever seen it in the wild, so just trying to wrap my head around this.
For mappedcrs we test if changing the projection of two points with the same X and different Y gets points with the same X.
Are there ever cases where that holds for two projections with a different CoordSys?
I know that there are some projections which use lat/lon but are shifted/rotated compared to EPSG:4326, but I think all of those have Earth Projection 1. My thinking was that, since we're assuming the world is spherical anyway, we don't need to do any reprojecting in those cases.
I don't really know CoordSys well. But I imagine yes? Anything grid aligned can be converted to another with independent transformations of X and Y. So e.g. a cylindrical equal area projection can be converted to lat/lon by independently transforming it's xs and ys. In lat/lon X will have regular steps, Y will be irregular. Rasters.reproject should try to do this for you.
But it means you can use the same math you are using for lat/lon after the transformation of the lookups.
I don't really know CoordSys well.
I don't either and I can't really find much about it, so if you know a better check I'd be happy.
So basically we can just check if the mappedcrs is lon-lat and if it reproject the lookups and calculate the cellsize for those. That makes sense to me.
Yeah. There is probably a more correct way... We need Julia crs parsers that give you more useful information more easily.
So e.g. a cylindrical equal area projection can be converted to lat/lon by independently transforming it's xs and ys.
Do you have an example of a raster like this, so I can add a test? I can't find any in the docs.
You can resample to it?
I just came across https://github.com/google/s2geometry/issues/190, it seems pretty insightful - maybe we should use one of the approaches there, or convert early to completely spherical (x, y, z) Cartesian points?
With the PR, this works correctly for lat/long. But if I resample the last example raster to web mercator, I get this:
I just came across google/s2geometry#190, it seems pretty insightful - maybe we should use one of the approaches there, or convert early to completely spherical (x, y, z) Cartesian points?
But even in that thread they mention their fallback is Girard's theorem, which is exactly what we use.
In this blogpost: https://www.johndcook.com/blog/2021/11/29/area-of-spherical-triangle/, Erikson's formula was basically equal to the estimate using just planar geometry for the area between some cities in Texas: "The triangle is so flat that numerical error has a bigger effect than the curvature of the earth." And here we're talking about the error that becomes relevant when gridcells are well below 1 km^2.
We actually already test for an equal-area projection with grid spacing of 100x100m (the test is if all grid cells are within 1% of 0.1 km^2).
My vote is for assuming flat space whenever grid cells get much smaller than that and just using the haversine formula.
With the PR, this works correctly for lat/long. But if I resample the last example raster to web mercator, I get this:
Could you copy the code?
Okay thinking about this some more, I'm not sure if the Haversine formula is numerically stable at this grid spacing either. It also involves a bunch of square roots and sin and cos
Did some more testing and I can report back that
- yes, this is a numeric stability issue
- yes, Eriksson's formula performs better in this particular example
I did this (_linearring_area is the current implementation):
xb = (166.0, 166.0006)
yb = (-78.0, -78.0003)
ring = [
(xb[1], yb[1]),
(xb[2], yb[1]),
(xb[2], yb[2]),
(xb[1], yb[2]),
(xb[1], yb[1])
]
ring_bf = map(p -> BigFloat.(p), ring)
eriksson(GI.LinearRing(ring))
_linearring_area(GI.LinearRing(ring); radius = 1)
eriksson(GI.LinearRing(ring_bf))
_linearring_area(GI.LinearRing(ring_bf); radius = 1)
which returns
julia> eriksson(GI.LinearRing(ring))
1.1399895306967292e-11
julia> _linearring_area(GI.LinearRing(ring); radius = 1)
2.2859936166241823e-11
julia> eriksson(GI.LinearRing(ring_bf))
1.139989369278327047449715369542151261981068071545916771493908664977035933833957e-11
julia> _linearring_area(GI.LinearRing(ring_bf); radius = 1)
1.140013862214309994513259890860796762002232266534835233057189872420615878310853e-11
So: when we use big floats both methods return the same answer, and this answer corresponds to what Eriksson's formula returns using Float64.
I implemented Eriksson's formula like this. It has all these dot products and cross products so I think it's going to allocate no matter what.
function eriksson(ring)
points = GI.getpoint(ring)
cartesian_points = lonlat_to_cartesian.(points)
area = 0.0
area += eriksson_triangle(cartesian_points[1], cartesian_points[2], cartesian_points[3])
area += eriksson_triangle(cartesian_points[3], cartesian_points[4], cartesian_points[1])
return area
end
function eriksson_triangle(a, b, c)
t = abs(dot(a, cross(b, c)))
t /= 1 + dot(b,c) + dot(c, a) + dot(a, b)
# alternative that could be more precise but allocates more
# t = abs(dot(a, (cross(b .- a, c .- a))) / dot(b .+ a, c .+ a))
2*atan(t)
end
lonlat_to_cartesian(args) = lonlat_to_cartesian(args...)
function lonlat_to_cartesian(lon, lat)
x = cosd(lat) * cosd(lon)
y = cosd(lat) * sind(lon)
z = sind(lat)
return [x, y, z]
end
Just some observations from testing in this way:
- calculation with Girard's theorem starts misbehaving when the height is lower than 0.001
- it is more unstable for high/low latitudes
So the options are:
- always use eriksson
- implement both and let users decide
- implement both and automatically switch when the area is less than let's say 1e-6 square degrees.
I implemented Eriksson's formula like this. It has all these dot products and cross products so I think it's going to allocate no matter what.
I usually use static vectors or tuples to get around that, you could probably try that, but StaticArrays is a pretty heavy dependency, and tuples don't have all the nice LinearAlgebra dispatches you'd need.
Although you could define a SphericalPoint type here that encodes those specific dispatches in a way where size is known at compile-time...that sounds like a lot of work :D
Yes or we can define a _dot and _cross that work on tuples of size 3 :)
Here's a super basic implementation that should get you what you need:
using LinearAlgebra
struct SphericalPoint{T <: Real}
data::NTuple{T, 3}
end
SphericalPoint(x, y, z) = SphericalPoint((x, y, z))
# define the 4 basic mathematical operators elementwise on the data tuple
Base.:+(p::SphericalPoint, q::SphericalPoint) = SphericalPoint(p.data .+ q.data)
Base.:-(p::SphericalPoint, q::SphericalPoint) = SphericalPoint(p.data .- q.data)
Base.:*(p::SphericalPoint, q::SphericalPoint) = SphericalPoint(p.data .* q.data)
Base.:/(p::SphericalPoint, q::SphericalPoint) = SphericalPoint(p.data ./ q.data)
# Define sum on a SphericalPoint to sum across its data
Base.sum(p::SphericalPoint) = sum(p.data)
# define dot and cross products
LinearAlgebra.dot(p::SphericalPoint, q::SphericalPoint) = sum(p * q)
function LinearAlgebra.cross(a::SphericalPoint, b::SphericalPoint)
a1, a2, a3 = a
b1, b2, b3 = b
SphericalPoint((a2*b3-a3*b2, a3*b1-a1*b3, a1*b2-a2*b1))
end
okay let me benchmark this!
EDIT: not bad!
julia> @btime eriksson($ring)
1.045 μs (1 allocation: 176 bytes)
1.1399895306958715e-11
julia> @btime _linearring_area($ring; radius = 1)
2.050 μs (0 allocations: 0 bytes)
2.2859936166241823e-11
EDIT2: even better after I got rid of the final allocation
julia> @btime eriksson($ring)
678.571 ns (0 allocations: 0 bytes)
1.1399895306958715e-11
Hmm, curious why there's still an allocation though
Hmm, curious why there's still an allocation though
because of this line, which isn't so hard to get rid of.
cartesian_points = lonlat_to_cartesian.(points)
Also just reading back in https://github.com/google/s2geometry/issues/190, and the reason they cite for needing to keep Girard's Theorem around is for points that are antipodal, which I don't think is ever going to happen with a Raster, so I think we can safely go for the option to just always use Eriksson's formula