api
api copied to clipboard
Duplicated results - WOF - DiffPlace.js
Some searches in France (such as "Lognes", "Sucy en Brie" or "Boissy St Léger") seem to lead to duplicated results.
These queries return, among others, respectively:
- /v1/search?text=Lognes --> "whosonfirst:locality:101749917", "whosonfirst:localadmin:404408589"
- /v1/search?text=Sucy en Brie --> "whosonfirst:locality:101751083", "whosonfirst:localadmin:404395717"
- /v1/search?text=Boissy St Léger --> "whosonfirst:locality:101749051", "whosonfirst:localadmin:404352493"
So there is apparently a problem with wof duplicate data "locality" and "localadmin", and also with duplication checking (in "middleware/dedup.js").
To fix this it is apparently possible either to change the wof import in order to prevent "locality" and "localadmin" duplicates, or to add another test in "isDifferent()" function from "helper/diffPlaces.js".
What do you think ?
Hi @jbgriesner, Thanks for providing some very nice test cases. I believe we should solve this in the API deduplication middleware.
If I had to design it right now, I would say that it should operate by looking at multiple WOF records and if one is a locality, the other is a localadmin, their names are the same, and the localadmin is the parent of the locality, we should consider them duplicates
Which one to prefer is and interesting question. My intuition is it should default to the locality. If needed we could come up with something more complex.
I'm currently in the process of refactoring the dedupe middleware in https://github.com/pelias/api/pull/1222
However, I suspect this issue will be improved by the work the WOF team is currently doing in https://github.com/whosonfirst-data/whosonfirst-data/pull/1343
Deduplicating between localadmin
and locality
layers is a UX question, in a lot of cases, these two concepts are different from a legal and administrative point-of-view but synonymous from a casual users perspective.
We would need to choose if we want to be technically correct or user-friendly :)
Here's another example of administrative area duplication:
/v1/autocomplete?boundary.country=aus&text=gungahlin,
Basically we get a WOF neighbourhood, locality, and localadmin with the same name, plus a Geonames record of the same name. The Geonames record shows as a venue, but is probably an admin area that's incorrectly classified by our importer
All of these examples have now been fixed after https://github.com/pelias/api/pull/1230, except for http://pelias.github.io/compare/#/v1/search%3Ftext=Boissy%20St%20L%C3%A9ger which appears to be failing because of differing diacriticals. We can probably both fix that in WOF data and add code to ignore diacriticals when deduping.
Brussels also has several duplicates: https://pelias.github.io/compare/#/v1/autocomplete?layers=locality&text=Bruss&debug=0
Some are in fact part of other localadmins like Dilbeek
It seems like a WOF issue though?