grass
grass copied to clipboard
libgis: use GeographicLib from PROJ
- use GeographicLib for geodesic distance and area calculations (see #1235)
CentOS 7 is out with this PR because it gets proj-4.8.0 from EPEL which is too old...
The big question is, if we accept the fact that this PR does not work with stock CentOS 7, which the GRASS community needs to answer.
@metzm Markus, you asked me to review but even I am a somewhat experienced GIS user I unfortunately understand very little about coding.
Maybe @kbevers is willing to take a look at this PR?
Looks about right to me but I am in no way an expert in the geodesic code so take my blessings with a grain of salt
I have tested with testfile linked in QGIS ml:
ellipsoidal_area.geojson.txt
(added .txt
file extension as Github doesn't support geojson).
With this PR (proj=7.2.1) 14737935340.099 sq.m
With G 7.8.5 (proj=7.1.1) 14718097678.906 sq.m
QGIS 3.10.11 (proj=5.2.0) 14718097678.949 sq.m
For nc_spm_08
/ boundary_county@PERMANENT
feature: NAME:WAKE
PR: 2219365235.93 sq.m
7.8.5: 2219365235.93 sq.m
The results seem to me to be correct, with no regression. I didn't encounter any other issue either.
I've tested and can confirm that in current trunk, results between GRASS GIS and the GeographicLib implementation in R (geosphere) are identical (to the sq m):
v.in.ogr ellipsoidal_area.geojson out=ell_test
v.to.db -p ell_test op=area --q
1|14737935340.0989
library(geosphere)
library(rgdal)
poly<-readOGR("ellipsoidal_area.geojson")
areaPolygon(poly)
14737935340
Results in GRASS GIS also seem stable when generalizing and more generalized versions do concur with the geosphere output:
v.generalize ell_test method=douglas out=ell_test_00000001 thresh=0.00000001 --o --q
v.to.db -p ell_test_00000001 op=area --q
1|14737935340.0999
v.generalize ell_test method=douglas out=ell_test_0000001 thresh=0.0000001 --o --q
v.to.db -p ell_test_0000001 op=area --q
1|14737935354.4453
library(geosphere)
library(rgdal)
poly<-readOGR("ell_test_0000001.json")
areaPolygon(poly)
[1] 14737935354
And testing with the data described in this old trac ticket the results are fairly stable:
v.in.ogr GRASS_area_problem/training_single.shp out=poly --o
v.generalize poly method=douglas out=poly_gen_000000001 thresh=0.000000001 --o --q
v.generalize poly method=douglas out=poly_gen_000000003 thresh=0.000000003 --o --q
v.generalize poly method=douglas out=poly_gen_00000001 thresh=0.00000001 --o --q
v.to.db -p poly op=area --q
1|0.126277758915019
v.to.db -p poly_gen_000000001 op=area --q
1|0.126328429
v.to.db -p poly_gen_000000003 op=area --q
1|0.12635081776554964583
v.to.db -p poly_gen_00000001 op=area --q
1|0.126351168928104
But results are not identical to geosphere results:
v.out.ogr poly format=GeoJSON out=poly.json
v.out.ogr poly_gen_000000001 format=GeoJSON out=poly_gen_000000001.json
v.out.ogr poly_gen_000000003 format=GeoJSON out=poly_gen_000000003.json
v.out.ogr poly format=GeoJSON out=poly.json
library(geosphere)
library(rgdal)
poly<-readOGR("poly.json")
areaPolygon(poly)
[1] 0.126354
poly<-readOGR("poly_gen_000000001.json")
areaPolygon(poly)
[1] 0.1263502
poly<-readOGR("poly_gen_000000003.json")
areaPolygon(poly)
[1] 0.1263632
poly<-readOGR("poly_gen_00000001.json")
areaPolygon(poly)
[1] 0.1263439
Any idea why ?
Testing the example of mlennert, one reason for different results with R geosphere is that a and f where not given, however, using the same a and f as in GRASS I still don't get the same results, and my results with GRASS differ from the results of mlennert with GRASS:
mlennert GRASS master + PROJ ?
metzm GRASS master + PROJ 7.2.1
R geosphere without a, f
R geosphere with a = 6.37814e+06, f = 0.00335281
poly
0.126277758915019 mlennert
0.126272547475285 metzm
0.126354 R geosphere without a, f
0.126373 R geosphere with a, f
poly_gen_000000001
0.126328429 mlennert
0.126329002640606 metzm
0.1263502 R geosphere without a, f
0.1263837 R geosphere with a, f
poly_gen_000000003
0.12635081776554964583 mlennert
0.126352616683089 metzm
0.1263632 R geosphere without a, f
0.1264062 R geosphere with a, f
poly_gen_00000001
0.126351168928104 mlennert
0.126362257568278 metzm
0.1263439 R geosphere without a, f
0.1263677 R geosphere with a, f
If mlennert also used PROJ 7.2.1, there must be some numerical instability in the GeographicLib as included in PROJ, otherwise differences between mlennert and metzm might be due to different PROJ versions.
Altogether, the observed differences can not even be explained with single precision floating point limitations. GeographicLib can no longer be used as reference because GeographicLib itself produces different results on different systems.
My results with the R geosphere package are identical to those of mlennert, but according to the geosphere documentation, a and f need to be specified.
For my tests with R geosphere, the precision of a and f was too low. Using more precise values a = 6378137, f = 0.00335281066474746
, the results of R geosphere with and without a and f are identical.
There is also the Planimeter tool of GeographicLib. Thus there are three different implementations of the algorithm of Charles Karney: Planimeter, geodesic.[c|h] in PROJ and R geosphere. Unfortunately with four different results:
poly
0.126272547475285 GRASS + PROJ metzm
0.126277758915019 GRASS + PROJ mlennert
0.126354 R geosphere
0.12638 Planimeter
Differences begin to occur already at the fourth significant decimal digit, latest at the sixth significant decimal digit. This is IMHO too high.
There is also the Planimeter tool of GeographicLib.
Have you also used the "Planimeter" tool with the "-E" option?
Anyway, I think geodesic/ellipsoidal area calculation are not actual useful for polygons of very small extension (like the "training_single" layer which is about 0.1 square meters).
@agiudiceandrea @kbevers
There is also the Planimeter tool of GeographicLib.
Have you also used the "Planimeter" tool with the "-E" option?
Same result. Does not answer the question why Planimeter and PROJ geodesic provide different results.
Anyway, I think geodesic/ellipsoidal area calculation are not actual useful for polygons of very small extension (like the "training_single" layer which is about 0.1 square meters).
But PROJ geodesic.h states that "for the WGS84 ellipsoid, the errors are less than 15 nanometers". Thus I expect accurate and consistent (!) results for our example which is with > 0.1 spm huge relative to the given errors.
I could create a new issue in PROJ for the inconsistency between GeographicLib and geodesic.[c|h] in PROJ.
I think it's time to call in the cavalry. @cffk, above are showcased inconsistencies between various setups using geodesic calculations from GeographicLib - any insight as to why?
OK, I'll take a look. The intention is that PROJ's geodesic calculations (C code) should mirror very accurately what you get from GeographicLib (C++ code).
But PROJ geodesic.h states that "for the WGS84 ellipsoid, the errors are less than 15 nanometers". Thus I expect accurate and consistent (!) results for our example which is with > 0.1 spm huge relative to the given errors.
I would also expect consistent results.
Anyway, the Planimeter online tool description at https://geographiclib.sourceforge.io/cgi-bin/Planimeter states that: "The result for the perimeter is accurate to about 15 nm per vertex. The result for the area is accurate to about 0.1 m2 per vertex."
The same is stated in C. F. F. Karney, Algorithms for geodesics, J. Geodesy 87, 43-55 (2013): "The round-off errors in the direct and inverse methods are less than 15 nanometers and the error in the computation of the area S12 is about 0.1 m2."
If mlennert also used PROJ 7.2.1, there must be some numerical instability in the GeographicLib as included in PROJ, otherwise differences between mlennert and metzm might be due to different PROJ versions.
GRASS ll_wgs85/test_proj:~ > g.version -rbe
GRASS 7.9.dev (2021)
./configure --prefix=/usr/lib --sysconfdir=/etc --sharedstatedir=/var --enable-socket --enable-shared --enable-largefile --with-openmp --with-postgres --with-pthread --with-cxx --with-x --with-gdal --with-freetype --with-motif --with-readline --with-nls --with-odbc --with-sqlite --with-freetype-includes=/usr/include/freetype2 --with-tcltk-includes=/usr/include/tcl --with-postgres-includes=/usr/include/postgresql --with-proj-share=/usr/share/proj --with-python=/usr/bin/python-config --with-cairo --with-geos --with-blas --with-lapack --with-zstd --with-pdal=/usr/bin/pdal-config --with-netcdf=/usr/bin/nc-config
libgis revision: 8b4c4986e
libgis date: 2021-02-08T07:13:27+00:00
PROJ: 7.2.1
GDAL/OGR: 3.2.1
GEOS: 3.9.0
SQLite: 3.34.1
GRASS ll_wgs85/test_proj:~ > v.to.db -p poly op=area --q
1|0.126277758915019
I'm on Debian testing. Is there any env variable I need to set or something ? Is there a way to test that I'm really using the GeographicLib ?
But PROJ geodesic.h states that "for the WGS84 ellipsoid, the errors are less than 15 nanometers". Thus I expect accurate and consistent (!) results for our example which is with > 0.1 spm huge relative to the given errors.
I would also expect consistent results.
Anyway, the Planimeter online tool description at https://geographiclib.sourceforge.io/cgi-bin/Planimeter states that: "The result for the perimeter is accurate to about 15 nm per vertex. The result for the area is accurate to about 0.1 m2 per vertex."
0.1 m2 per vertex seems quite a lot actually, especially for small polygons. Is this completely independent of the size ?
IIUC, the small test polygon used above has 45 vertices, so that would mean an expected error of 4.5m2 ?
I'm on Debian testing. Is there any env variable I need to set or something ? Is there a way to test that I'm really using the GeographicLib ?
If you use this PR, geodesic.[c|h] (the C version of GeographicLib) from PROJ is automatically used.
I'm on Debian testing. Is there any env variable I need to set or something ? Is there a way to test that I'm really using the GeographicLib ?
If you use this PR, geodesic.[c|h] (the C version of GeographicLib) from PROJ is automatically used.
Judging from libgis revision: 8b4c4986e it is not this PR
The "exact" area of the 44-sided training_single polygon is
-0.126347391348170 m^2
if the coordinates in the KML file are treated as exact decimal numbers. This calculation is performed by the MPFR-enabled version of GeographicLib for the WGS84 ellipsoid. The plain double version of Planimeter and a simple planimeter implementation using PROJ both give
-0.1263768025 m^2
so the error is 30 mm^2 or about 1 mm^2 per vertex. The error in the "GRASS + PROJ metzm" calculation is 75 mm^2 in the other direction. I don't think you should read too much into the discrepancies between the various PROJ based calculations. These are probably attributable to differences in the compiler
My estimate of the error of 0.1 m^2 per vertex tries to capture the worst case for arbitrary polygons. This reflects the underlying method, namely applying the trapezium rule to the polygon with the equator as the baseline. The width of the trapezium is computed by a geodesic calculation with error 10e-9 m; the height of the trapezium is at most 10e7 m. So the error per side is at most
10e-9 m x 10e7 m = 0.1 m^2
Almost certainly, the calculation of the area benefits from a cancellation of errors. It might be worth trying to tighten up the bounds on the errors; e.g., using approximate squares of edges between 10 cm and 100 km. I can generate a set of test data along these lines together with accurate areas.
As to the question of whether 100 mm^2 = (1 cm)^2 (let's say) is too big an error for this polygon...
It would be difficult to wring much more accuracy out of a method using global coordinates (latitude + longitude) and double precision arithmetic. It would require, for example, reformulating the geodesic calculations to give high relative accuracy for short lines. (Currently, they just give high absolute accuracy, which, at 10e-9 m, is good enough for most applications.) For what it's worth
A / 2^52 = 0.11 m^2
where A is the area of the ellipsoid.
The "exact" area of the 44-sided training_single polygon is
-0.126347391348170 m^2
For reference:
-
using training_single.shp in GRASS_area_problem.zip from https://trac.osgeo.org/grass/ticket/3356
-
GRASS 7.8.5 [1] v.to.db -p training_single_shp op=area --q 1|0.126347391488286
-
QGIS 3.16.3 [2] $area = 0.12634739148828641
-
-
using training_single.kml from https://trac.osgeo.org/grass/ticket/3356
-
GRASS 7.8.5 [1] v.to.db -p training_single_kml op=area --q
1|0.126347391454861 -
QGIS 3.16.3 [2] $area = 0.12634739145486146
-
So, the value of the area of this particular polygon, calculated with the current GRASS/QGIS implementation of the geodesic area calculation algorithm, differs by about 1E-10 square meters from the "exact" value.
[1] GRASS version: 7.8.5 - Code revision: 2b6ab2893 - Windows 7 64 bit (OSGeo4W 64 bit) [2] QGIS 3.16.3 - Code revision: 94ac9f21b8 - Windows 7 64 bit (OSGeo4W 64 bit)
How about the polygon from the QGIS issue https://github.com/qgis/QGIS/issues/40888? For that the difference in area between current QGIS and GeographicLib was 0.654%.
POLYGON ((
20.13293641 59.95688345,
26.94617837 60.47397663,
29.74782155 62.56499443,
27.45254202 68.70650340,
23.75771765 68.24937206,
25.42698984 65.27444593,
21.51545237 63.10353609,
21.40562760 61.12318104,
19.41123592 60.40477513,
20.13293641 59.95688345))
This tiny test polygon with a size of about 0.1263 sqm seems to hit the limits of floating point calculations and trigonometric functions. For larger areas, the Planimeter tool of GeographicLib and the corresponding C version in PROJ provide identical results.
Let's assume that GeographicLib provides the most accurate results for geodesic distance and area calculations available in the osgeo world. The different results of the Planimeter tool of GeographicLib and the corresponding C version in PROJ might be explained by different compilers and/or compiler flags. In my case, using -fno-fast-math with PROJ changes the size of the tiny test polygon from 0.126272547475285 to 0.12634739123375 which is very close to 0.126347391348170, the "exact" value calculated by @cffk. Planimeter's 0.12638 from the package manager of my Linux distro is thus a bit off.
Compilers and compiler flags seem to make a difference, explaining any inconsistencies for extreme cases between he Planimeter tool of GeographicLib and the corresponding C version in PROJ.
AFAIK, QGIS uses the same (or very similar) algorithm to calculate area sizes as the GRASS native algorithm and is thus not suitable for comparison of accurate results, only for comparison of consistency.
How about the polygon from the QGIS issue qgis/QGIS#40888? For that the difference in area between current QGIS and GeographicLib was 0.654%.
POLYGON (( 20.13293641 59.95688345, 26.94617837 60.47397663, 29.74782155 62.56499443, 27.45254202 68.70650340, 23.75771765 68.24937206, 25.42698984 65.27444593, 21.51545237 63.10353609, 21.40562760 61.12318104, 19.41123592 60.40477513, 20.13293641 59.95688345))
The "exact" area here is (assuming coordinates are lon,lat and WGS84):
251199344354.42985178 m^2
One possible source of confusion is that the online planimeter tool is compiled to use long double
so you get 11 bits more accuracy compared to the Planimeter tool you get with most distributions of GeographicLib.
So, the value of the area of this particular polygon, calculated with the current GRASS/QGIS implementation of the geodesic area calculation algorithm, differs by about 1E-10 square meters from the "exact" value.
If you mean that the difference is perhaps too small for having any practical meaning, it may be true with this tiny polygon but have a look at the half-a-Finland polygon that I pasted as a comment. That shows 0.654% difference in areas.
"exact" area (m^2) = 251199344354.42985178
QGIS 3.17 with GRASS method = 249566957499.7546
So, the value of the area of this particular polygon, calculated with the current GRASS/QGIS implementation of the geodesic area calculation algorithm, differs by about 1E-10 square meters from the "exact" value.
If you mean that the difference is perhaps too small for having any practical meaning, it may be true with this tiny polygon but have a look at the half-a-Finland polygon that I pasted as a comment. That shows 0.654% difference in areas.
"exact" area (m^2) = 251199344354.42985178 QGIS 3.17 with GRASS method = 249566957499.7546
Using this PR (with PROJ 7.2.1) the same polygon gives the result 251199344354.431 which is pretty close to the "exact" area.
"exact" area (m^2) = 251199344354.42985178 QGIS 3.17 with GRASS method = 249566957499.7546
Using this PR (with PROJ 7.2.1) the same polygon gives the result 251199344354.431 which is pretty close to the "exact" area.
Close enough for me.
Now that I got my PROJ compiler flags right, I get in all reported cases results very close to these "exact" results of GeographicLib. For me, the inconsistencies are resolved.
The relatively big difference between the "old" grass/qgis and the new calculation (this PR) for the Finland polygon, may perhaps display the limitations of the old method.
Testing a polygon (~ Uganda) along the equator:
gives ~0.012% difference:
qgis (=old grass) = 192668499676.993
this PR = 192692521685.248
I'm on Debian testing. Is there any env variable I need to set or something ? Is there a way to test that I'm really using the GeographicLib ?
If you use this PR, geodesic.[c|h] (the C version of GeographicLib) from PROJ is automatically used.
Judging from libgis revision: 8b4c498 it is not this PR
Although the issue seems to be solved by more informed people than me (and I really don't have much time for GRASS dev at the moment), here for the record. First of all, in the original track ticket there are two versions of the polygon, one in shapefile format, the other in KML. Results are not the same depending on which one I use.
- I use the 'git pr' alias system explained here:
git pr 1283
GRASS ll_wgs85/test_proj:grass > git status
Sur la branche pr/1283
- I then run my grass make script which goes through a complete distclean before compiling:
make distclean
CFLAGS="-g -Wall -fopenmp -lgomp -Ofast -fno-fast-math
-march=core-avx-i -fno-common -fexceptions $PENTIUM64"
LDFLAGS="-Wl,--no-undefined -fopenmp -lgomp"
CXXFLAGS="-g -Ofast"
./configure --prefix=/usr/lib \
--sysconfdir=/etc \
--sharedstatedir=/var \
--enable-socket \
--enable-shared \
--enable-largefile \
--with-openmp \
--with-postgres \
--with-pthread \
--with-cxx \
--with-x \
--with-gdal \
--with-freetype \
--with-motif \
--with-readline \
--with-nls \
--with-odbc \
--with-sqlite \
--with-freetype-includes=/usr/include/freetype2 \
--with-tcltk-includes=/usr/include/tcl \
--with-postgres-includes=`pg_config --includedir` \
--with-proj-share=/usr/share/proj \
--with-python=/usr/bin/python-config \
--with-cairo \
--with-geos \
--with-blas \
--with-lapack \
--with-zstd \
--with-pdal=/usr/bin/pdal-config \
--with-netcdf=/usr/bin/nc-config \
&& \
make > build.log 2>&1
- When I then launch the resulting grass startup script, I get the following results:
> g.version -rbe
GRASS 7.9.dev (2021)
./configure --prefix=/usr/lib --sysconfdir=/etc --sharedstatedir=/var --enable-socket --enable-shared --enable-largefile --with-openmp --with-postgres --with-pthread --with-cxx --with-x --with-gdal --with-freetype --with-motif --with-readline --with-nls --with-odbc --with-sqlite --with-freetype-includes=/usr/include/freetype2 --with-tcltk-includes=/usr/include/tcl --with-postgres-includes=/usr/include/postgresql --with-proj-share=/usr/share/proj --with-python=/usr/bin/python-config --with-cairo --with-geos --with-blas --with-lapack --with-zstd --with-pdal=/usr/bin/pdal-config --with-netcdf=/usr/bin/nc-config
libgis revision: a214a76c4
libgis date: 2021-01-29T17:25:10+00:00
PROJ: 7.2.1
GDAL/OGR: 3.2.1
GEOS: 3.9.0
SQLite: 3.34.1
> v.to.db -p poly_shapefile op=area --q
1|0.126277758915019
> v.to.db -p poly_kml op=area --q
1|0.126376802518881
Both are within 1/1000 m2 of the apparent "correct" area, so even though I have no idea where the differences between @metzm and me come from (compiler settings ?) I can live with that.
Thinking about it, I prefer to introduce new GPJ_*()
functions in lib/proj
instead of adding PROJ lib as a new dependency to lib/gis
. Thus all library functions that depend on PROJ lib will be (as before) in lib/proj
. A new check needs to be introduced in configure because the minimum required PROJ version would now be PROJ-4.9.x.