docker icon indicating copy to clipboard operation
docker copied to clipboard

Problem with forward gocoding results

Open wsukta opened this issue 6 years ago • 3 comments

Hello guys, I use Pelias in docker and I have some problem with forward geocoding . The problems is when I geocode adders from same city on the same street, the geocoder returns incomplete informations. I present this situtaion based on two address example like :

Słoneczna 1 Myślenice [street,city,number] Słoneczna 4 Myślenice [street,city,number] So when i geoocde the first adders the gecoder returns :

{u'geometry': {u'type': u'Point', u'coordinates': [19.947045, 49.836457]}, u'type': u'Feature', u'properties': {u'layer': u'address', u'match_type': u'exact', u'county_gid': u'whosonfirst:county:102079441', u'region_gid': u'whosonfirst:region:85687291', u'county': u'My\xc5lenicki', u'street': u'S\u0142oneczna', u'country_a': u'POL', u'label': u'1 S\u0142oneczna, My\u015blenice, Poland', u'id': u'pl/dane_ref:580c0a82-29bb-4a70-85e0-ba0f7c735e58', u'confidence': 1, u'locality': u'My\u015blenice', u'continent': u'Europe', u'source': u'openaddresses', u'gid': u'openaddresses:address:pl/dane_ref:580c0a82-29bb-4a70-85e0-ba0f7c735e58', u'housenumber': u'1', u'accuracy': u'point', u'country_gid': u'whosonfirst:country:85633723', u'source_id': u'pl/dane_ref:580c0a82-29bb-4a70-85e0-ba0f7c735e58', u'postalcode': u'32-400', u'continent_gid': u'whosonfirst:continent:102191581', u'distance': 4.415, u'name': u'1 S\u0142oneczna', u'locality_gid': u'whosonfirst:locality:101826165', u'country': u'Poland', u'region': u'Lesser Poland Voivodeship', u'region_a': u'MA'}} - This is a raw_result from geopy

1 Słoneczna, Myślenice, Poland - This is a label from raw_result In this example everything is ok .

Second example : {u'geometry': {u'type': u'Point', u'coordinates': [19.939914, 49.833548]}, u'type': u'Feature', u'properties': {u'layer': u'locality', u'match_type': u'fallback', u'county_gid': u'whosonfirst:county:102079441', u'region_gid': u'whosonfirst:region:85687291', u'county': u'My\xc5lenicki', u'country_a': u'POL', u'label': u'My\u015blenice, Poland', u'continent': u'Europe', u'confidence': 0.6, u'locality': u'My\u015blenice', u'id': u'101826165', u'source': u'whosonfirst', u'gid': u'whosonfirst:locality:101826165', u'accuracy': u'centroid', u'country_gid': u'whosonfirst:country:85633723', u'source_id': u'101826165', u'continent_gid': u'whosonfirst:continent:102191581', u'distance': 5, u'name': u'My\u015blenice', u'locality_gid': u'whosonfirst:locality:101826165', u'country': u'Poland', u'region': u'Lesser Poland Voivodeship', u'region_a': u'MA'}, u'bbox': [19.8978956349, 49.8266998886, 19.9624726392, 49.8429191096]} - This is a raw_result from geopy

Myślenice, Poland - This is a label from raw_result

Sth is wrong. Why in this example the geocoder returns the centroid of city ? I have this address in database from openaddresses so why geocoder dosent return appropriate coordinates?

So after that i decide to geocode this address but without city name. Gecoder returns a few addresses from different city,among them the one that interested me. ( I recognized this adders by postal code) {u'geometry': {u'type': u'Point', u'coordinates': [19.949678, 49.832716]}, u'type': u'Feature', u'properties': {u'layer': u'address', u'match_type': u'exact', u'county_gid': u'whosonfirst:county:102079441', u'region_gid': u'whosonfirst:region:85687291', u'county': u'My\xc5lenicki', u'street': u'S\u0142oneczna', u'country_a': u'POL', u'label': u'4 S\u0142oneczna, Poland', u'continent': u'Europe', u'confidence': 1, u'id': u'pl/dane_ref:4d9f04f9-6eb3-4a45-867e-a955834650a8', u'source': u'openaddresses', u'gid': u'openaddresses:address:pl/dane_ref:4d9f04f9-6eb3-4a45-867e-a955834650a8', u'housenumber': u'4', u'accuracy': u'point', u'country_gid': u'whosonfirst:country:85633723', u'source_id': u'pl/dane_ref:4d9f04f9-6eb3-4a45-867e-a955834650a8', u'postalcode': u'32-400', u'continent_gid': u'whosonfirst:continent:102191581', u'distance': 4.373, u'name': u'4 S\u0142oneczna', u'country': u'Poland', u'region': u'Lesser Poland Voivodeship', u'region_a': u'MA'}} 4 Słoneczna, Poland - This is a label from raw_result The question is why in this results label is without the City name (WOF locality) ? Is it a problem with WOF data or datbase?

I am so confused because without the city name (locality) in results it is hard to decide that this results is correct. What should i do in this situation?

wsukta avatar Dec 24 '18 10:12 wsukta

I'm having issues trying to read your examples, the geojson seems to be mangled by software.

Can you please post it again enclosed in backticks (see GitHub markdown guidelines) or via a pastebin.

missinglink avatar Jan 02 '19 20:01 missinglink

I have reformatted the comment , also i put it in gist on the following link : https://gist.github.com/w0jtis/f9e2b5d4a1e4c5d0cd5063a0be076a44

Hello guys, I use Pelias in docker and I have some problem with forward geocoding . The problems is when I geocode adders from same city on the same street, the geocoder returns incomplete informations. I present this situtaion based on two address example like :

Słoneczna 1 Myślenice [street,city,number] Słoneczna 4 Myślenice [street,city,number] So when i geoocde the first adders the gecoder returns :

{
"type": "Feature",
"geometry": {
    "type": "Point", 
    "coordinates": [19.947045, 49.836457]
  }, 
 "properties":{
    "layer": "address", 
    "match_type": "exact", 
    "county_gid": "whosonfirst:county:102079441", 
    "region_gid": "whosonfirst:region:85687291", 
    "county": "My\xc5lenicki", 
    "street": "S\u0142oneczna", 
    "country_a": "POL", 
    "label": "1 S\u0142oneczna, My\u015blenice, Poland", 
    "id": "pl/dane_ref:580c0a82-29bb-4a70-85e0-ba0f7c735e58", 
    "confidence": 1, 
    "locality": "My\u015blenice", 
    "continent": "Europe", 
    "source": "openaddresses", 
    "gid": "openaddresses:address:pl/dane_ref:580c0a82-29bb-4a70-85e0-ba0f7c735e58", 
    "housenumber": "1", 
    "accuracy": "point", 
    "country_gid": "whosonfirst:country:85633723", 
    "source_id": "pl/dane_ref:580c0a82-29bb-4a70-85e0-ba0f7c735e58", 
    "postalcode": "32-400", "continent_gid": 
    "whosonfirst:continent:102191581", 
    "distance": 4.415, 
    "name": "1 S\u0142oneczna", 
    "locality_gid": "whosonfirst:locality:101826165", 
    "country": "Poland", 
    "region": "Lesser Poland Voivodeship", 
    "region_a": "MA"
    }
 }

1 Słoneczna, Myślenice, Poland - This is a label from raw_result In this example everything is ok .

Second example :

{
 "type": "Feature",
 "geometry":{
    "type": "Point", 
    "coordinates": [19.939914, 49.833548]
  },  
  "properties":{
    "layer": "locality", 
    "match_type": "fallback",
    "county_gid": "whosonfirst:county:102079441", 
    "region_gid": "whosonfirst:region:85687291", 
    "county": "My\xc5lenicki", 
    "country_a": "POL", 
    "label": "My\u015blenice, Poland", 
    "continent": "Europe", 
    "confidence": 0.6, 
    "locality": "My\u015blenice", 
    "id": "101826165", 
    "source": "whosonfirst", 
    "gid": "whosonfirst:locality:101826165", 
    "accuracy": "centroid", 
    "country_gid": "whosonfirst:country:85633723",
    "source_id": "101826165", 
    "continent_gid": "whosonfirst:continent:102191581", 
    "distance": 5, 
    "name": "My\u015blenice", 
    "locality_gid": "whosonfirst:locality:101826165", 
    "country": "Poland", 
    "region": "Lesser Poland Voivodeship",
    "region_a": "MA"}, 
    "bbox": [19.8978956349, 49.8266998886, 19.9624726392, 49.8429191096]
}

Myślenice, Poland - This is a label from raw_result

Sth is wrong. Why in this example the geocoder returns the centroid of city ? I have this address in database from openaddresses so why geocoder dosent return appropriate coordinates?

So after that i decide to geocode this address but without city name. So i type the following address to geocode : "Słoneczna 4" Gecoder returns a few addresses from different city,among them the one that interested me. ( I recognized this adders by postal code)

{
  "type": "Feature",
  "geometry": {
     "type": "Point", 
     "coordinates": [19.949678, 49.832716]
  },  
   "properties": {
      "layer": "address", 
      "match_type": "exact", 
      "county_gid": "whosonfirst:county:102079441", 
      "region_gid": "whosonfirst:region:85687291", 
      "county": "My\xc5lenicki", 
      "street": "S\u0142oneczna", 
      "country_a": "POL", "label": 
      "4 S\u0142oneczna, Poland", 
      "continent": "Europe", 
      "confidence": 1, 
      "id": "pl/dane_ref:4d9f04f9-6eb3-4a45-867e-a955834650a8", 
      "source": "openaddresses", 
      "gid": "openaddresses:address:pl/dane_ref:4d9f04f9-6eb3-4a45-867e-a955834650a8", 
      "housenumber": "4", 
      "accuracy": "point", 
      "country_gid": "whosonfirst:country:85633723", 
      "source_id": "pl/dane_ref:4d9f04f9-6eb3-4a45-867e-a955834650a8", 
      "postalcode": "32-400", "continent_gid": 
      "whosonfirst:continent:102191581", 
      "distance": 4.373, 
      "name": "4 S\u0142oneczna", 
      "country": "Poland", 
      "region": "Lesser Poland Voivodeship", 
      "region_a": "MA"
      }
  }

4 Słoneczna, Poland - This is a label from raw_result

The result is the same no matter that i use pelias by geopy or by creating query in browser

The question is why in this results label is without the City name (WOF locality) ? Is it a problem with WOF data or datbase? I am so confused because without the city name (locality) in results it is hard to decide that this results is correct. What should i do in this situation?

wsukta avatar Jan 03 '19 20:01 wsukta

Hi @w0jtis, thanks for pasting the results again. Python seems to make geojson very ugly and hard to read :(

The reason this is happening is due to the shape we have for Myślenice. The data was originally sourced from the Quattroshapes, which is known to have issues in some areas.

You'll find that the coordinates for 1 Słoneczna lie within the boundary we have for Myślenice, however the coordinates for 4 Słoneczna lie outside of the boundary shape.

For this reason, the 4 Słoneczna address is not being associated with the locality of Myślenice.

When a matching address cannot be found, the system will try to 'fall-back' to a street, and if it can't find that, then it will 'fall-back' again to showing the locality you asked for.

There are two solutions to this problem:

  1. Fix the data. Could you please suggest a source of open-data for Poland that we could use instead? see https://www.whosonfirst.org/docs/licenses/ for more info on what licences are accepted.

  2. Modify the code.

    1. We are looking at a few options at import-time to handle situations like this where the data is not very good, it would be possible to have some code which detected 'nearby localities' and used those. Care needs to be taken when writing this code so we don't associate the wrong locality. We feel that it's better to have no locality than having the wrong locality.

    2. We are also looking at some ways to make the matching more lenient at search-time, so in a case like this we would not find any address within the locality you specified, so then we would try to search nearby places to see if we can find something nearby which matches.

I hope that helps to explain what's going on. Please let me know if you can find any better data for Poland!

missinglink avatar Jan 04 '19 07:01 missinglink