api icon indicating copy to clipboard operation
api copied to clipboard

Duplicated results - WOF - DiffPlace.js

Open jbgriesner opened this issue 6 years ago • 5 comments

Some searches in France (such as "Lognes", "Sucy en Brie" or "Boissy St Léger") seem to lead to duplicated results.

These queries return, among others, respectively:

So there is apparently a problem with wof duplicate data "locality" and "localadmin", and also with duplication checking (in "middleware/dedup.js").

To fix this it is apparently possible either to change the wof import in order to prevent "locality" and "localadmin" duplicates, or to add another test in "isDifferent()" function from "helper/diffPlaces.js".

What do you think ?

jbgriesner avatar Dec 06 '17 10:12 jbgriesner

Hi @jbgriesner, Thanks for providing some very nice test cases. I believe we should solve this in the API deduplication middleware.

If I had to design it right now, I would say that it should operate by looking at multiple WOF records and if one is a locality, the other is a localadmin, their names are the same, and the localadmin is the parent of the locality, we should consider them duplicates

Which one to prefer is and interesting question. My intuition is it should default to the locality. If needed we could come up with something more complex.

orangejulius avatar Jan 14 '18 04:01 orangejulius

I'm currently in the process of refactoring the dedupe middleware in https://github.com/pelias/api/pull/1222

However, I suspect this issue will be improved by the work the WOF team is currently doing in https://github.com/whosonfirst-data/whosonfirst-data/pull/1343

Deduplicating between localadmin and locality layers is a UX question, in a lot of cases, these two concepts are different from a legal and administrative point-of-view but synonymous from a casual users perspective.

We would need to choose if we want to be technically correct or user-friendly :)

missinglink avatar Oct 30 '18 16:10 missinglink

Here's another example of administrative area duplication:

/v1/autocomplete?boundary.country=aus&text=gungahlin, image

Basically we get a WOF neighbourhood, locality, and localadmin with the same name, plus a Geonames record of the same name. The Geonames record shows as a venue, but is probably an admin area that's incorrectly classified by our importer

orangejulius avatar Oct 30 '18 16:10 orangejulius

All of these examples have now been fixed after https://github.com/pelias/api/pull/1230, except for http://pelias.github.io/compare/#/v1/search%3Ftext=Boissy%20St%20L%C3%A9ger which appears to be failing because of differing diacriticals. We can probably both fix that in WOF data and add code to ignore diacriticals when deduping.

orangejulius avatar Feb 12 '19 04:02 orangejulius

Brussels also has several duplicates: https://pelias.github.io/compare/#/v1/autocomplete?layers=locality&text=Bruss&debug=0

Some are in fact part of other localadmins like Dilbeek

It seems like a WOF issue though?

bboure avatar Jun 22 '20 17:06 bboure