ngsi-timeseries-api
ngsi-timeseries-api copied to clipboard
Broken OSM city shape
Describe the bug
Geocoding of city locations is currently broken.
To Reproduce
Run test_entity_add_city_shape
in the geocoding.tests.test_geocoding
module. If you debug it, you'll see that the call to OSM
info = geocoder.osm(key, maxRows=10)
returns an info
object with a current_result.osm_type == 'node'
whereas we expect current_result.osm_type == 'relation'
.
Expected behavior
QuantumLeap should extract the city polygon from OSM's result, create a location
attribute for it, and add the location
to the entity being processed in the notification.
Environment
- Nominatim server:
nominatim.openstreetmap.org
- Server version: N/A, test was working before 10 Sep 2020 but broke on the 10th and 11th, perhaps they changed the API?
- QuantumLeap query: https://nominatim.openstreetmap.org/search?q=Ciudad+de+M%C3%A9xico+%2C+MX&format=jsonv2&addressdetails=1&limit=10
Additional context
PR #357 disabled test_entity_add_city_shape
to get a clean build. We'll keep it disabled until we fix this issue, then re-enable it after the fix.
this rather a change in the database of osm, if you try for example Milan: https://nominatim.openstreetmap.org/search?q=Milano&format=jsonv2&addressdetails=1&limit=10
either way, it's broken :-)
perhaps we shouldn't consider the osm_type
since it isn't part of the API. I'm not even sure what their API actually is, so I'd have to dig deeper before fixing this. But to me, operation inputs and outputs are an integral part of an API. So we should see if API outputs are clearly documented somewhere, then stick to that...
Stale issue message
Hi @chicco785 @c0c0n3 we are reproducing this issue in our environment,on debugging we found that for the first iteration in https://github.com/orchestracities/ngsi-timeseries-api/blob/66c57d33a550b431432adfb40f20e8a4ec0d7730/src/geocoding/geocoding.py#L153 osm_type = 'node' but in the second iteration we got osm_type = 'relation'. We are assuming that this issue is for the first iteration only.
Please confirm our understanding.
Hi @Necravisaketi!
Thanks for helping with this :-)
Not quite sure what you mean by "first" and "second" iteration. Two runs of the same test case? If so which one? Or do you mean something else?
On another note, I must admit I'm just as clueless as you are. I don't actually know the details of the OSM API and if the meaning of 'osm_type' has changed over time, it's related to data changes as @chicco785 pointed out, or something else entirely. The first step here would be to get familiar with the API, dig into the docs and possibly ask the question on an OSM forum?
Hi @c0c0n3 on debugging we found that for the first iteration in https://github.com/orchestracities/ngsi-timeseries-api/blob/66c57d33a550b431432adfb40f20e8a4ec0d7730/src/geocoding/geocoding.py#L152 https://github.com/orchestracities/ngsi-timeseries-api/blob/66c57d33a550b431432adfb40f20e8a4ec0d7730/src/geocoding/geocoding.py#L153 the output for the first iteration of i.raw['osm_type'] is 'node' if i.raw['osm_type'] == 'relation': (Pdb) pp i.raw['osm_type'] 'node'
the output for the second iteration of i.raw['osm_type'] is 'relation' if i.raw['osm_type'] == 'relation': (Pdb) pp i.raw['osm_type'] 'relation'
oh right, thanks so much for clarifying, I missed that. But like I said we should probably get hold of the API spec to see what the returned data structure is supposed to look like. Once we know that we can plan a fix. It's entirely possible we'll have to rewrite part of that logic. Also, can you copy-paste in here an example of the info
dictionary you get?
Hi @c0c0n3 i got this list of locations in the info
[Location(Ciudad de México, 06060, México, (19.4326296, -99.1331785, 0.0)), Location(Ciudad de México, México, (19.3207722, -99.15151506199419, 0.0)), Location(Ciudad de México, Maradunas, Lomas de Barrillas, Coatzacoalcos, Veracruz, 96535, México, (18.1510321, -94.5181526, 0.0)), Location(Pozoleria "La Troje", Iztacalco, Calle Plan de Ayala, Colonia Nueva Santa Anita, Iztacalco, Ciudad de México, 08210, México, (19.3997047, -99.1248161, 0.0)), Location(CNDH Héctor Fix Zamudio, Ciudad de México, 1922, Boulevard Adolfo López Mateos, Colonia Los Alpes, Álvaro Obregón, Ciudad de México, 01049, México, (19.3575358, -99.1951691, 0.0))]
Hi @Necravisaketi, thanks for this but I'm struggling to see how this list of locations relates to the info
map? e.g. where's the osm_type
key-value pair?
Hi @c0c0n3
we got this in the first iteration i.raw
where i
is in info
{'boundingbox': ['19.2726296', '19.5926296', '-99.2931785', '-98.9731785'],
'class': 'place',
'display_name': 'Ciudad de México, 06060, México',
'geojson': {'coordinates': [-99.1331785, 19.4326296], 'type': 'Point'},
'icon': 'https://nominatim.openstreetmap.org/ui/mapicons//poi_place_city.p.20.png',
'importance': 1.149923932441765,
'lat': '19.4326296',
'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. '
'https://osm.org/copyright',
'lon': '-99.1331785',
'osm_id': 62270270,
'osm_type': 'node',
'place_id': 286135844,
'type': 'city'}
in the second iteration i.raw
is
{'boundingbox': ['19.0487187', '19.5927572', '-99.3649242', '-98.9403028'],
'class': 'boundary',
'display_name': 'Ciudad de México, México',
'geojson': {'coordinates': [[[-99.3649242, 19.2777693],
[-99.3619219, 19.2757419],
[-99.3561284, 19.2749894],
[-99.3482137, 19.2739609],
.
.
.
.
.
etc
'icon': 'https://nominatim.openstreetmap.org/ui/mapicons//poi_boundary_administrative.p.20.png',
'importance': 1.0538840457137142,
'lat': '19.3207722',
'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. '
'https://osm.org/copyright',
'lon': '-99.15151506199419',
'osm_id': 1376330,
'osm_type': 'relation',
'place_id': 285199565,
'type': 'administrative'}
Hi @Necravisaketi thank you so much for helping out with this, much appreciated!
To sum up, there's at least two JSON objects in the info
array returned by OSM:
info = [
{ 'osm_id': 62270270, 'osm_type': 'node', ...},
{ 'osm_id': 1376330, 'osm_type': 'relation', ...}
]
Thanks again for that info, but I reckon to get to the bottom of this, we've got to dig a bit more. Here's some pseudo-code to capture the logic of the flow we're interested in.
add_locations: [e1, e2, ...]
return [add_location(e1), add_location(e2), ...]
add_location(e)
key, osm_type = get_address_key_and_type(e)
info = asyncio.run(geocode(key))
if osm_type == TYPE_POINT:
loc = _extract_point(info)
elif osm_type == TYPE_WAY:
for i in info: # street
if i.raw['osm_type'] == 'way':
loc = i.raw['geojson']
break
elif osm_type == TYPE_RELATION:
for i in info: # state or country
if i.raw['osm_type'] == 'relation':
loc = i.raw['geojson']
break
_do_add_location(e, loc)
To understand what's happening, we need to know what are the input entities, i.e. the content of the array passed to add_locations
and for each entity in there what's the corresponding key
and osm_type
the get_address_key_and_type
call returns.