extraction-framework icon indicating copy to clipboard operation
extraction-framework copied to clipboard

Wrong lat/long for many entries in geo_coordinates*_en.ttl

Open mazieres opened this issue 8 years ago • 4 comments

I've found many errors in the lat/long reported in both geo_coordinates_en.ttl and geo_coordinates_mappingbased_en.ttl (2016-04).

For instance:

  • For Western_Australia, the wikipedia page report 26°S 121°E while the DBpedia resource points at 26.0 121.0 (somewhere near Taiwan...) instead of -26.0 121.0.

  • For Morocco, the wikipedia page report for the largest city (Casablanca) 33°32′N 7°35′W while the DBpedia resource points at 33.53333333333333 7.583333333333333 (Somewhere in Tunisia...) instead of 33.53333333333333 -7.583333333333333.

It seems that sometimes the conversion from compass direction format to signed degrees format fails.

$ grep "<http://dbpedia.org/resource/Western_Australia>" geo_coordinates_mappingbased_en.ttl
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> .
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/2003/01/geo/wgs84_pos#lat> "26.0"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/2003/01/geo/wgs84_pos#long> "121.0"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Western_Australia> <http://www.georss.org/georss/point> "26.0 121.0".
$ grep "<http://dbpedia.org/resource/Morocco>" geo_coordinates_en.ttl
<http://dbpedia.org/resource/Morocco> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> .
<http://dbpedia.org/resource/Morocco> <http://www.w3.org/2003/01/geo/wgs84_pos#lat> "33.53333333333333"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Morocco> <http://www.w3.org/2003/01/geo/wgs84_pos#long> "7.583333333333333"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Morocco> <http://www.georss.org/georss/point> "33.53333333333333 7.583333333333333".

I can't measure it precisely but my guess is that a few thousands records are corrupted this way.

mazieres avatar Dec 12 '16 13:12 mazieres

Thanks for the report @mazieres this is a duplicate of #106. We are currently working on replacing the DBpedia mapping language with RML and such configurations should be enabled then

jimkont avatar Dec 13 '16 16:12 jimkont

@mazieres: #016 explains that the mapping Infobox_Australian_place needs to set default (constant) latDir "S" since the default in the code is "N". Can you check what infobox is used by Morocco, and whether the problem is the same?

@jimkont There is a a PR at #106. Won't it be better to merge this PR so we can fix these mappings, rather than wait for a new technology to be adopted?

VladimirAlexiev avatar Jul 14 '17 08:07 VladimirAlexiev

@VladimirAlexiev @mazieres The problem seems to be fixed in the latest DBpedia releases. Here is what we get in the 2020.04.01 release for https://databus.dbpedia.org/dbpedia/generic/geo-coordinates/

<http://dbpedia.org/resource/Western_Australia> <http://www.georss.org/georss/point> "-26.0 121.0" .
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> .
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/2003/01/geo/wgs84_pos#lat> "-26.0"^^<http://www.w3.org/2001/XMLSchema#float> .
<http://dbpedia.org/resource/Western_Australia> <http://www.w3.org/2003/01/geo/wgs84_pos#long> "121.0"^^<http://www.w3.org/2001/XMLSchema#float> .

The values seems to be correct, i.e. -26.0 and 121.0.

Can we close the issue?

Anyways, before closing the issue we need to write a test for this.

m1ci avatar May 15 '20 13:05 m1ci

@m1ci have you checked Casablanca?

VladimirAlexiev avatar Sep 06 '20 09:09 VladimirAlexiev