grass icon indicating copy to clipboard operation
grass copied to clipboard

libgis: use GeographicLib from PROJ

Open metzm opened this issue 4 years ago • 33 comments

  • use GeographicLib for geodesic distance and area calculations (see #1235)

metzm avatar Jan 29 '21 17:01 metzm

CentOS 7 is out with this PR because it gets proj-4.8.0 from EPEL which is too old...

The big question is, if we accept the fact that this PR does not work with stock CentOS 7, which the GRASS community needs to answer.

metzm avatar Jan 29 '21 18:01 metzm

@metzm Markus, you asked me to review but even I am a somewhat experienced GIS user I unfortunately understand very little about coding.

jratike80 avatar Jan 29 '21 20:01 jratike80

Maybe @kbevers is willing to take a look at this PR?

neteler avatar Jan 30 '21 10:01 neteler

Looks about right to me but I am in no way an expert in the geodesic code so take my blessings with a grain of salt

kbevers avatar Jan 30 '21 12:01 kbevers

I have tested with testfile linked in QGIS ml: ellipsoidal_area.geojson.txt (added .txt file extension as Github doesn't support geojson).

With this PR (proj=7.2.1) 14737935340.099 sq.m

With G 7.8.5 (proj=7.1.1) 14718097678.906 sq.m

QGIS 3.10.11 (proj=5.2.0) 14718097678.949 sq.m

For nc_spm_08 / boundary_county@PERMANENT feature: NAME:WAKE PR: 2219365235.93 sq.m 7.8.5: 2219365235.93 sq.m

The results seem to me to be correct, with no regression. I didn't encounter any other issue either.

nilason avatar Feb 11 '21 09:02 nilason

I've tested and can confirm that in current trunk, results between GRASS GIS and the GeographicLib implementation in R (geosphere) are identical (to the sq m):

v.in.ogr ellipsoidal_area.geojson out=ell_test
v.to.db -p ell_test op=area --q
1|14737935340.0989
library(geosphere)
library(rgdal)
poly<-readOGR("ellipsoidal_area.geojson")
areaPolygon(poly)
14737935340

Results in GRASS GIS also seem stable when generalizing and more generalized versions do concur with the geosphere output:

v.generalize ell_test method=douglas out=ell_test_00000001 thresh=0.00000001 --o --q
v.to.db -p ell_test_00000001 op=area --q
1|14737935340.0999

v.generalize ell_test method=douglas out=ell_test_0000001 thresh=0.0000001 --o --q
v.to.db -p ell_test_0000001 op=area --q
1|14737935354.4453
library(geosphere)
library(rgdal)
poly<-readOGR("ell_test_0000001.json")
areaPolygon(poly)
[1] 14737935354

And testing with the data described in this old trac ticket the results are fairly stable:

v.in.ogr GRASS_area_problem/training_single.shp out=poly --o
v.generalize poly method=douglas out=poly_gen_000000001 thresh=0.000000001 --o --q
v.generalize poly method=douglas out=poly_gen_000000003 thresh=0.000000003 --o --q
v.generalize poly method=douglas out=poly_gen_00000001 thresh=0.00000001 --o --q
v.to.db -p poly op=area --q
1|0.126277758915019
v.to.db -p poly_gen_000000001 op=area --q
1|0.126328429
v.to.db -p poly_gen_000000003 op=area --q
1|0.12635081776554964583
v.to.db -p poly_gen_00000001 op=area --q
1|0.126351168928104

But results are not identical to geosphere results:

v.out.ogr poly format=GeoJSON out=poly.json
v.out.ogr poly_gen_000000001 format=GeoJSON out=poly_gen_000000001.json
v.out.ogr poly_gen_000000003 format=GeoJSON out=poly_gen_000000003.json
v.out.ogr poly format=GeoJSON out=poly.json
library(geosphere)
library(rgdal)
poly<-readOGR("poly.json")
areaPolygon(poly)
[1] 0.126354
poly<-readOGR("poly_gen_000000001.json")
areaPolygon(poly)
[1] 0.1263502
poly<-readOGR("poly_gen_000000003.json")
areaPolygon(poly)
[1] 0.1263632
poly<-readOGR("poly_gen_00000001.json")
areaPolygon(poly)
[1] 0.1263439

Any idea why ?

mlennert avatar Feb 11 '21 15:02 mlennert

Testing the example of mlennert, one reason for different results with R geosphere is that a and f where not given, however, using the same a and f as in GRASS I still don't get the same results, and my results with GRASS differ from the results of mlennert with GRASS:

mlennert GRASS master + PROJ ?
metzm GRASS master + PROJ 7.2.1
R geosphere without a, f
R geosphere with a = 6.37814e+06, f = 0.00335281

poly

0.126277758915019           mlennert
0.126272547475285           metzm
0.126354                    R geosphere without a, f
0.126373                    R geosphere with a, f

poly_gen_000000001

0.126328429                 mlennert
0.126329002640606           metzm
0.1263502                   R geosphere without a, f
0.1263837                   R geosphere with a, f

poly_gen_000000003

0.12635081776554964583      mlennert
0.126352616683089           metzm
0.1263632                   R geosphere without a, f
0.1264062                   R geosphere with a, f

poly_gen_00000001

0.126351168928104           mlennert
0.126362257568278           metzm
0.1263439                   R geosphere without a, f
0.1263677                   R geosphere with a, f

If mlennert also used PROJ 7.2.1, there must be some numerical instability in the GeographicLib as included in PROJ, otherwise differences between mlennert and metzm might be due to different PROJ versions.

Altogether, the observed differences can not even be explained with single precision floating point limitations. GeographicLib can no longer be used as reference because GeographicLib itself produces different results on different systems.

My results with the R geosphere package are identical to those of mlennert, but according to the geosphere documentation, a and f need to be specified.

metzm avatar Feb 13 '21 18:02 metzm

For my tests with R geosphere, the precision of a and f was too low. Using more precise values a = 6378137, f = 0.00335281066474746, the results of R geosphere with and without a and f are identical.

metzm avatar Feb 14 '21 13:02 metzm

There is also the Planimeter tool of GeographicLib. Thus there are three different implementations of the algorithm of Charles Karney: Planimeter, geodesic.[c|h] in PROJ and R geosphere. Unfortunately with four different results:

poly
0.126272547475285           GRASS + PROJ metzm
0.126277758915019           GRASS + PROJ mlennert
0.126354                    R geosphere
0.12638			    Planimeter

Differences begin to occur already at the fourth significant decimal digit, latest at the sixth significant decimal digit. This is IMHO too high.

metzm avatar Feb 14 '21 17:02 metzm

There is also the Planimeter tool of GeographicLib.

Have you also used the "Planimeter" tool with the "-E" option?

Anyway, I think geodesic/ellipsoidal area calculation are not actual useful for polygons of very small extension (like the "training_single" layer which is about 0.1 square meters).

agiudiceandrea avatar Feb 15 '21 21:02 agiudiceandrea

@agiudiceandrea @kbevers

There is also the Planimeter tool of GeographicLib.

Have you also used the "Planimeter" tool with the "-E" option?

Same result. Does not answer the question why Planimeter and PROJ geodesic provide different results.

Anyway, I think geodesic/ellipsoidal area calculation are not actual useful for polygons of very small extension (like the "training_single" layer which is about 0.1 square meters).

But PROJ geodesic.h states that "for the WGS84 ellipsoid, the errors are less than 15 nanometers". Thus I expect accurate and consistent (!) results for our example which is with > 0.1 spm huge relative to the given errors.

I could create a new issue in PROJ for the inconsistency between GeographicLib and geodesic.[c|h] in PROJ.

metzm avatar Feb 15 '21 22:02 metzm

I think it's time to call in the cavalry. @cffk, above are showcased inconsistencies between various setups using geodesic calculations from GeographicLib - any insight as to why?

kbevers avatar Feb 15 '21 22:02 kbevers

OK, I'll take a look. The intention is that PROJ's geodesic calculations (C code) should mirror very accurately what you get from GeographicLib (C++ code).

cffk avatar Feb 15 '21 23:02 cffk

But PROJ geodesic.h states that "for the WGS84 ellipsoid, the errors are less than 15 nanometers". Thus I expect accurate and consistent (!) results for our example which is with > 0.1 spm huge relative to the given errors.

I would also expect consistent results.

Anyway, the Planimeter online tool description at https://geographiclib.sourceforge.io/cgi-bin/Planimeter states that: "The result for the perimeter is accurate to about 15 nm per vertex. The result for the area is accurate to about 0.1 m2 per vertex."

The same is stated in C. F. F. Karney, Algorithms for geodesics, J. Geodesy 87, 43-55 (2013): "The round-off errors in the direct and inverse methods are less than 15 nanometers and the error in the computation of the area S12 is about 0.1 m2."

agiudiceandrea avatar Feb 15 '21 23:02 agiudiceandrea

If mlennert also used PROJ 7.2.1, there must be some numerical instability in the GeographicLib as included in PROJ, otherwise differences between mlennert and metzm might be due to different PROJ versions.

GRASS ll_wgs85/test_proj:~ > g.version -rbe
GRASS 7.9.dev (2021)

 ./configure  --prefix=/usr/lib --sysconfdir=/etc --sharedstatedir=/var --enable-socket --enable-shared --enable-largefile --with-openmp --with-postgres --with-pthread --with-cxx --with-x --with-gdal --with-freetype --with-motif --with-readline --with-nls --with-odbc --with-sqlite --with-freetype-includes=/usr/include/freetype2 --with-tcltk-includes=/usr/include/tcl --with-postgres-includes=/usr/include/postgresql --with-proj-share=/usr/share/proj --with-python=/usr/bin/python-config --with-cairo --with-geos --with-blas --with-lapack --with-zstd --with-pdal=/usr/bin/pdal-config --with-netcdf=/usr/bin/nc-config
libgis revision: 8b4c4986e
libgis date: 2021-02-08T07:13:27+00:00
PROJ: 7.2.1
GDAL/OGR: 3.2.1
GEOS: 3.9.0
SQLite: 3.34.1

GRASS ll_wgs85/test_proj:~ > v.to.db -p poly op=area --q
1|0.126277758915019

I'm on Debian testing. Is there any env variable I need to set or something ? Is there a way to test that I'm really using the GeographicLib ?

mlennert avatar Feb 16 '21 08:02 mlennert

But PROJ geodesic.h states that "for the WGS84 ellipsoid, the errors are less than 15 nanometers". Thus I expect accurate and consistent (!) results for our example which is with > 0.1 spm huge relative to the given errors.

I would also expect consistent results.

Anyway, the Planimeter online tool description at https://geographiclib.sourceforge.io/cgi-bin/Planimeter states that: "The result for the perimeter is accurate to about 15 nm per vertex. The result for the area is accurate to about 0.1 m2 per vertex."

0.1 m2 per vertex seems quite a lot actually, especially for small polygons. Is this completely independent of the size ?

IIUC, the small test polygon used above has 45 vertices, so that would mean an expected error of 4.5m2 ?

mlennert avatar Feb 16 '21 09:02 mlennert

I'm on Debian testing. Is there any env variable I need to set or something ? Is there a way to test that I'm really using the GeographicLib ?

If you use this PR, geodesic.[c|h] (the C version of GeographicLib) from PROJ is automatically used.

metzm avatar Feb 16 '21 17:02 metzm

I'm on Debian testing. Is there any env variable I need to set or something ? Is there a way to test that I'm really using the GeographicLib ?

If you use this PR, geodesic.[c|h] (the C version of GeographicLib) from PROJ is automatically used.

Judging from libgis revision: 8b4c4986e it is not this PR

nilason avatar Feb 16 '21 17:02 nilason

The "exact" area of the 44-sided training_single polygon is

-0.126347391348170 m^2

if the coordinates in the KML file are treated as exact decimal numbers. This calculation is performed by the MPFR-enabled version of GeographicLib for the WGS84 ellipsoid. The plain double version of Planimeter and a simple planimeter implementation using PROJ both give

-0.1263768025 m^2

so the error is 30 mm^2 or about 1 mm^2 per vertex. The error in the "GRASS + PROJ metzm" calculation is 75 mm^2 in the other direction. I don't think you should read too much into the discrepancies between the various PROJ based calculations. These are probably attributable to differences in the compiler

My estimate of the error of 0.1 m^2 per vertex tries to capture the worst case for arbitrary polygons. This reflects the underlying method, namely applying the trapezium rule to the polygon with the equator as the baseline. The width of the trapezium is computed by a geodesic calculation with error 10e-9 m; the height of the trapezium is at most 10e7 m. So the error per side is at most

10e-9 m x 10e7 m = 0.1 m^2

Almost certainly, the calculation of the area benefits from a cancellation of errors. It might be worth trying to tighten up the bounds on the errors; e.g., using approximate squares of edges between 10 cm and 100 km. I can generate a set of test data along these lines together with accurate areas.

As to the question of whether 100 mm^2 = (1 cm)^2 (let's say) is too big an error for this polygon...

It would be difficult to wring much more accuracy out of a method using global coordinates (latitude + longitude) and double precision arithmetic. It would require, for example, reformulating the geodesic calculations to give high relative accuracy for short lines. (Currently, they just give high absolute accuracy, which, at 10e-9 m, is good enough for most applications.) For what it's worth

A / 2^52 = 0.11 m^2

where A is the area of the ellipsoid.

cffk avatar Feb 16 '21 19:02 cffk

The "exact" area of the 44-sided training_single polygon is

-0.126347391348170 m^2

For reference:

  • using training_single.shp in GRASS_area_problem.zip from https://trac.osgeo.org/grass/ticket/3356

    • GRASS 7.8.5 [1] v.to.db -p training_single_shp op=area --q 1|0.126347391488286

    • QGIS 3.16.3 [2] $area = 0.12634739148828641

  • using training_single.kml from https://trac.osgeo.org/grass/ticket/3356

    • GRASS 7.8.5 [1] v.to.db -p training_single_kml op=area --q
      1|0.126347391454861

    • QGIS 3.16.3 [2] $area = 0.12634739145486146

So, the value of the area of this particular polygon, calculated with the current GRASS/QGIS implementation of the geodesic area calculation algorithm, differs by about 1E-10 square meters from the "exact" value.

[1] GRASS version: 7.8.5 - Code revision: 2b6ab2893 - Windows 7 64 bit (OSGeo4W 64 bit) [2] QGIS 3.16.3 - Code revision: 94ac9f21b8 - Windows 7 64 bit (OSGeo4W 64 bit)

agiudiceandrea avatar Feb 16 '21 20:02 agiudiceandrea

How about the polygon from the QGIS issue https://github.com/qgis/QGIS/issues/40888? For that the difference in area between current QGIS and GeographicLib was 0.654%.

POLYGON ((
 20.13293641   59.95688345,
 26.94617837   60.47397663,
 29.74782155   62.56499443,
 27.45254202   68.70650340,
 23.75771765   68.24937206,
 25.42698984   65.27444593,
 21.51545237   63.10353609,
 21.40562760   61.12318104,
 19.41123592   60.40477513,
 20.13293641   59.95688345))

jratike80 avatar Feb 16 '21 21:02 jratike80

This tiny test polygon with a size of about 0.1263 sqm seems to hit the limits of floating point calculations and trigonometric functions. For larger areas, the Planimeter tool of GeographicLib and the corresponding C version in PROJ provide identical results.

Let's assume that GeographicLib provides the most accurate results for geodesic distance and area calculations available in the osgeo world. The different results of the Planimeter tool of GeographicLib and the corresponding C version in PROJ might be explained by different compilers and/or compiler flags. In my case, using -fno-fast-math with PROJ changes the size of the tiny test polygon from 0.126272547475285 to 0.12634739123375 which is very close to 0.126347391348170, the "exact" value calculated by @cffk. Planimeter's 0.12638 from the package manager of my Linux distro is thus a bit off.

Compilers and compiler flags seem to make a difference, explaining any inconsistencies for extreme cases between he Planimeter tool of GeographicLib and the corresponding C version in PROJ.

AFAIK, QGIS uses the same (or very similar) algorithm to calculate area sizes as the GRASS native algorithm and is thus not suitable for comparison of accurate results, only for comparison of consistency.

metzm avatar Feb 16 '21 22:02 metzm

How about the polygon from the QGIS issue qgis/QGIS#40888? For that the difference in area between current QGIS and GeographicLib was 0.654%.

POLYGON ((
 20.13293641   59.95688345,
 26.94617837   60.47397663,
 29.74782155   62.56499443,
 27.45254202   68.70650340,
 23.75771765   68.24937206,
 25.42698984   65.27444593,
 21.51545237   63.10353609,
 21.40562760   61.12318104,
 19.41123592   60.40477513,
 20.13293641   59.95688345))

The "exact" area here is (assuming coordinates are lon,lat and WGS84):

251199344354.42985178 m^2

cffk avatar Feb 16 '21 22:02 cffk

One possible source of confusion is that the online planimeter tool is compiled to use long double so you get 11 bits more accuracy compared to the Planimeter tool you get with most distributions of GeographicLib.

cffk avatar Feb 16 '21 22:02 cffk

So, the value of the area of this particular polygon, calculated with the current GRASS/QGIS implementation of the geodesic area calculation algorithm, differs by about 1E-10 square meters from the "exact" value.

If you mean that the difference is perhaps too small for having any practical meaning, it may be true with this tiny polygon but have a look at the half-a-Finland polygon that I pasted as a comment. That shows 0.654% difference in areas.

"exact" area (m^2)          = 251199344354.42985178 
QGIS 3.17 with GRASS method = 249566957499.7546

jratike80 avatar Feb 17 '21 07:02 jratike80

So, the value of the area of this particular polygon, calculated with the current GRASS/QGIS implementation of the geodesic area calculation algorithm, differs by about 1E-10 square meters from the "exact" value.

If you mean that the difference is perhaps too small for having any practical meaning, it may be true with this tiny polygon but have a look at the half-a-Finland polygon that I pasted as a comment. That shows 0.654% difference in areas.

"exact" area (m^2)          = 251199344354.42985178 
QGIS 3.17 with GRASS method = 249566957499.7546

Using this PR (with PROJ 7.2.1) the same polygon gives the result 251199344354.431 which is pretty close to the "exact" area.

nilason avatar Feb 17 '21 08:02 nilason

"exact" area (m^2)          = 251199344354.42985178 
QGIS 3.17 with GRASS method = 249566957499.7546

Using this PR (with PROJ 7.2.1) the same polygon gives the result 251199344354.431 which is pretty close to the "exact" area.

Close enough for me.

Now that I got my PROJ compiler flags right, I get in all reported cases results very close to these "exact" results of GeographicLib. For me, the inconsistencies are resolved.

metzm avatar Feb 17 '21 10:02 metzm

The relatively big difference between the "old" grass/qgis and the new calculation (this PR) for the Finland polygon, may perhaps display the limitations of the old method.

Testing a polygon (~ Uganda) along the equator:

equat.geojson.txt

gives ~0.012% difference:

qgis (=old grass)   =  192668499676.993
this PR             =  192692521685.248

nilason avatar Feb 17 '21 10:02 nilason

I'm on Debian testing. Is there any env variable I need to set or something ? Is there a way to test that I'm really using the GeographicLib ?

If you use this PR, geodesic.[c|h] (the C version of GeographicLib) from PROJ is automatically used.

Judging from libgis revision: 8b4c498 it is not this PR

Although the issue seems to be solved by more informed people than me (and I really don't have much time for GRASS dev at the moment), here for the record. First of all, in the original track ticket there are two versions of the polygon, one in shapefile format, the other in KML. Results are not the same depending on which one I use.

  • I use the 'git pr' alias system explained here:
git pr 1283 

GRASS ll_wgs85/test_proj:grass > git status
Sur la branche pr/1283

  • I then run my grass make script which goes through a complete distclean before compiling:
make distclean
CFLAGS="-g -Wall -fopenmp -lgomp -Ofast -fno-fast-math
-march=core-avx-i -fno-common -fexceptions $PENTIUM64"
LDFLAGS="-Wl,--no-undefined -fopenmp -lgomp"
CXXFLAGS="-g -Ofast"
./configure   --prefix=/usr/lib \
                    --sysconfdir=/etc \
                    --sharedstatedir=/var \
                    --enable-socket \
                    --enable-shared \
                    --enable-largefile \
                    --with-openmp \
                    --with-postgres \
                    --with-pthread \
                    --with-cxx \
                    --with-x \
                    --with-gdal \
                    --with-freetype \
                    --with-motif \
                    --with-readline \
                    --with-nls \
                    --with-odbc \
                    --with-sqlite \
                    --with-freetype-includes=/usr/include/freetype2 \
                    --with-tcltk-includes=/usr/include/tcl \
                    --with-postgres-includes=`pg_config --includedir` \
                    --with-proj-share=/usr/share/proj \
                    --with-python=/usr/bin/python-config \
                    --with-cairo \
                    --with-geos \
                    --with-blas \
                    --with-lapack \
                    --with-zstd \
            --with-pdal=/usr/bin/pdal-config \
            --with-netcdf=/usr/bin/nc-config \
&& \
make > build.log 2>&1
  • When I then launch the resulting grass startup script, I get the following results:
> g.version -rbe
GRASS 7.9.dev (2021)

 ./configure  --prefix=/usr/lib --sysconfdir=/etc --sharedstatedir=/var --enable-socket --enable-shared --enable-largefile --with-openmp --with-postgres --with-pthread --with-cxx --with-x --with-gdal --with-freetype --with-motif --with-readline --with-nls --with-odbc --with-sqlite --with-freetype-includes=/usr/include/freetype2 --with-tcltk-includes=/usr/include/tcl --with-postgres-includes=/usr/include/postgresql --with-proj-share=/usr/share/proj --with-python=/usr/bin/python-config --with-cairo --with-geos --with-blas --with-lapack --with-zstd --with-pdal=/usr/bin/pdal-config --with-netcdf=/usr/bin/nc-config
libgis revision: a214a76c4
libgis date: 2021-01-29T17:25:10+00:00
PROJ: 7.2.1
GDAL/OGR: 3.2.1
GEOS: 3.9.0
SQLite: 3.34.1

> v.to.db -p poly_shapefile op=area --q
1|0.126277758915019

> v.to.db -p poly_kml op=area --q
1|0.126376802518881

Both are within 1/1000 m2 of the apparent "correct" area, so even though I have no idea where the differences between @metzm and me come from (compiler settings ?) I can live with that.

mlennert avatar Feb 17 '21 11:02 mlennert

Thinking about it, I prefer to introduce new GPJ_*() functions in lib/proj instead of adding PROJ lib as a new dependency to lib/gis. Thus all library functions that depend on PROJ lib will be (as before) in lib/proj. A new check needs to be introduced in configure because the minimum required PROJ version would now be PROJ-4.9.x.

metzm avatar Feb 20 '21 22:02 metzm