ngsi-timeseries-api icon indicating copy to clipboard operation
ngsi-timeseries-api copied to clipboard

Broken OSM city shape

Open c0c0n3 opened this issue 4 years ago • 11 comments

Describe the bug

Geocoding of city locations is currently broken.

To Reproduce

Run test_entity_add_city_shape in the geocoding.tests.test_geocoding module. If you debug it, you'll see that the call to OSM

info = geocoder.osm(key, maxRows=10)

returns an info object with a current_result.osm_type == 'node' whereas we expect current_result.osm_type == 'relation'.

Expected behavior

QuantumLeap should extract the city polygon from OSM's result, create a location attribute for it, and add the location to the entity being processed in the notification.

Environment

  • Nominatim server: nominatim.openstreetmap.org
  • Server version: N/A, test was working before 10 Sep 2020 but broke on the 10th and 11th, perhaps they changed the API?
  • QuantumLeap query: https://nominatim.openstreetmap.org/search?q=Ciudad+de+M%C3%A9xico+%2C+MX&format=jsonv2&addressdetails=1&limit=10

Additional context

PR #357 disabled test_entity_add_city_shape to get a clean build. We'll keep it disabled until we fix this issue, then re-enable it after the fix.

c0c0n3 avatar Sep 11 '20 08:09 c0c0n3

this rather a change in the database of osm, if you try for example Milan: https://nominatim.openstreetmap.org/search?q=Milano&format=jsonv2&addressdetails=1&limit=10

chicco785 avatar Sep 11 '20 08:09 chicco785

either way, it's broken :-) perhaps we shouldn't consider the osm_type since it isn't part of the API. I'm not even sure what their API actually is, so I'd have to dig deeper before fixing this. But to me, operation inputs and outputs are an integral part of an API. So we should see if API outputs are clearly documented somewhere, then stick to that...

c0c0n3 avatar Sep 11 '20 09:09 c0c0n3

Stale issue message

github-actions[bot] avatar Mar 10 '21 02:03 github-actions[bot]

Hi @chicco785 @c0c0n3 we are reproducing this issue in our environment,on debugging we found that for the first iteration in https://github.com/orchestracities/ngsi-timeseries-api/blob/66c57d33a550b431432adfb40f20e8a4ec0d7730/src/geocoding/geocoding.py#L153 osm_type = 'node' but in the second iteration we got osm_type = 'relation'. We are assuming that this issue is for the first iteration only.

Please confirm our understanding.

Ravisaketi avatar May 06 '22 07:05 Ravisaketi

Hi @Necravisaketi!

Thanks for helping with this :-)

Not quite sure what you mean by "first" and "second" iteration. Two runs of the same test case? If so which one? Or do you mean something else?

On another note, I must admit I'm just as clueless as you are. I don't actually know the details of the OSM API and if the meaning of 'osm_type' has changed over time, it's related to data changes as @chicco785 pointed out, or something else entirely. The first step here would be to get familiar with the API, dig into the docs and possibly ask the question on an OSM forum?

c0c0n3 avatar May 06 '22 07:05 c0c0n3

Hi @c0c0n3 on debugging we found that for the first iteration in https://github.com/orchestracities/ngsi-timeseries-api/blob/66c57d33a550b431432adfb40f20e8a4ec0d7730/src/geocoding/geocoding.py#L152 https://github.com/orchestracities/ngsi-timeseries-api/blob/66c57d33a550b431432adfb40f20e8a4ec0d7730/src/geocoding/geocoding.py#L153 the output for the first iteration of i.raw['osm_type'] is 'node' if i.raw['osm_type'] == 'relation': (Pdb) pp i.raw['osm_type'] 'node'

the output for the second iteration of i.raw['osm_type'] is 'relation' if i.raw['osm_type'] == 'relation': (Pdb) pp i.raw['osm_type'] 'relation'

Ravisaketi avatar May 06 '22 14:05 Ravisaketi

oh right, thanks so much for clarifying, I missed that. But like I said we should probably get hold of the API spec to see what the returned data structure is supposed to look like. Once we know that we can plan a fix. It's entirely possible we'll have to rewrite part of that logic. Also, can you copy-paste in here an example of the info dictionary you get?

c0c0n3 avatar May 06 '22 15:05 c0c0n3

Hi @c0c0n3 i got this list of locations in the info

[Location(Ciudad de México, 06060, México, (19.4326296, -99.1331785, 0.0)), Location(Ciudad de México, México, (19.3207722, -99.15151506199419, 0.0)), Location(Ciudad de México, Maradunas, Lomas de Barrillas, Coatzacoalcos, Veracruz, 96535, México, (18.1510321, -94.5181526, 0.0)), Location(Pozoleria "La Troje", Iztacalco, Calle Plan de Ayala, Colonia Nueva Santa Anita, Iztacalco, Ciudad de México, 08210, México, (19.3997047, -99.1248161, 0.0)), Location(CNDH Héctor Fix Zamudio, Ciudad de México, 1922, Boulevard Adolfo López Mateos, Colonia Los Alpes, Álvaro Obregón, Ciudad de México, 01049, México, (19.3575358, -99.1951691, 0.0))]

Ravisaketi avatar May 09 '22 12:05 Ravisaketi

Hi @Necravisaketi, thanks for this but I'm struggling to see how this list of locations relates to the info map? e.g. where's the osm_type key-value pair?

c0c0n3 avatar May 13 '22 12:05 c0c0n3

Hi @c0c0n3 we got this in the first iteration i.raw where i is in info

{'boundingbox': ['19.2726296', '19.5926296', '-99.2931785', '-98.9731785'],
 'class': 'place',
 'display_name': 'Ciudad de México, 06060, México',
 'geojson': {'coordinates': [-99.1331785, 19.4326296], 'type': 'Point'},
 'icon': 'https://nominatim.openstreetmap.org/ui/mapicons//poi_place_city.p.20.png',
 'importance': 1.149923932441765,
 'lat': '19.4326296',
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. '
            'https://osm.org/copyright',
 'lon': '-99.1331785',
 'osm_id': 62270270,
 'osm_type': 'node',
 'place_id': 286135844,
 'type': 'city'}

in the second iteration i.raw is

{'boundingbox': ['19.0487187', '19.5927572', '-99.3649242', '-98.9403028'],
 'class': 'boundary',
 'display_name': 'Ciudad de México, México',
 'geojson': {'coordinates': [[[-99.3649242, 19.2777693],
                              [-99.3619219, 19.2757419],
                              [-99.3561284, 19.2749894],
                              [-99.3482137, 19.2739609],
                              .
                              . 
                              .
                              .
                              .
                              etc
'icon': 'https://nominatim.openstreetmap.org/ui/mapicons//poi_boundary_administrative.p.20.png',
 'importance': 1.0538840457137142,
 'lat': '19.3207722',
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. '
            'https://osm.org/copyright',
 'lon': '-99.15151506199419',
 'osm_id': 1376330,
 'osm_type': 'relation',
 'place_id': 285199565,
 'type': 'administrative'}

Ravisaketi avatar May 16 '22 07:05 Ravisaketi

Hi @Necravisaketi thank you so much for helping out with this, much appreciated!

To sum up, there's at least two JSON objects in the info array returned by OSM:

info = [
    { 'osm_id': 62270270, 'osm_type': 'node', ...},
    { 'osm_id': 1376330, 'osm_type': 'relation', ...}
]

Thanks again for that info, but I reckon to get to the bottom of this, we've got to dig a bit more. Here's some pseudo-code to capture the logic of the flow we're interested in.

add_locations: [e1, e2, ...]
    return [add_location(e1), add_location(e2), ...]

add_location(e)
    key, osm_type = get_address_key_and_type(e)
    info = asyncio.run(geocode(key))

    if osm_type == TYPE_POINT:
        loc = _extract_point(info)
    elif osm_type == TYPE_WAY:
        for i in info:  # street
            if i.raw['osm_type'] == 'way':
                loc = i.raw['geojson']
                break
    elif osm_type == TYPE_RELATION:
        for i in info:  # state or country
            if i.raw['osm_type'] == 'relation':
                loc = i.raw['geojson']
                break

    _do_add_location(e, loc)

To understand what's happening, we need to know what are the input entities, i.e. the content of the array passed to add_locations and for each entity in there what's the corresponding key and osm_type the get_address_key_and_type call returns.

c0c0n3 avatar May 20 '22 08:05 c0c0n3