Nominatim
Nominatim copied to clipboard
How to add streetname variations about persons so they can be found
A recurring problem in Italy are streetnames with the names of famous people in them (these are most streetnames actually). The street names tend to include the whole name (given names, surname) but in everydays life people do mostly/often not include the given names. There are different approaches to this topic, one possibility is to tag the abbreviated variants as alt_name etc. on every way, but seems to add a lot of extra work, data and redundancy that might eventually avoided. Is there a way we provide lists of common names (that are used for street names) and their shorter form? E.g. Via Alessandra Macinghi Strozzi could also be Via Macinghi Strozzi or maybe even Via Strozzi, plus abbreviated forms like Via A. Macinghi Strozzi There are also some exceptions: "Via Rosa Raimondi Garibaldi" will not be "Via Garibaldi" due to a much more famous, different Garibaldi, so a simple list of (optional) given names will probably not be sufficient, although it would catch the vast majority of cases). (married women will always have their maiden name first and their husband's name second, while married men will generally not have their women's name added, I think we might treat these double names as a single compound).
Real abbreviations like going from Via Alessandra Macinghi Strozzi to Via A. Macinghi Strozzi is something Nominatim should learn to do at some point but for the short names that leave out some of the words it is generally a good idea to add them as short_name or alt_name.
While it looks obvious to somebody who speaks Italian how to abbreviate the names, it is not at all obvious to a computer. Nominatim already does a simple version of matching by trying to match up word by word but that does not always yield the ideal result. It could start consulting wikipedia to get a list of names of famous persons but even that is prone to error, notably in the example of Mrs Garibaldi, you cite. It also varies greatly from region to region what can be abbreviated an what can't be done. In Germany, for example, it would be much more uncommon to do these kind of abbreviations.
So the best is really, to note down the short uses for that particular street. It can also come in handy for rendering, where errors in guessing the correct short name are even more problematic.
Yes, I was not referring to abbreviations with a full stop like A. for Alessandra. What I was asking was if it could make sense to go from "Albert Einstein" to "Einstein" or from "Guglielmo Marconi" to "Marconi", e.g. here: https://wiki.openstreetmap.org/wiki/Name_finder:Abbreviations#Italiano_-_Italian i.e. extend this list with a whole bunch of common persons in street names and how they are usually shortened, or make a new, similar list, etc.. This way we would do it only once for every person rather than thousands of times in the short_name tag (but it would likely be less reliable and surely less flexible than the short name version, would be limited to the persons and not to other names that might get shortened). But the current wiki table might not be a very suitable format to handle such lists when they become very long, and maybe also nominatim or name finder don't expect this list being very long and might get problems). Also while some persons occur in almost every village, many others, I guess - but am not sure - a majority, are "local heroes" and will only be relevant for a small region (i.e. the short_name approach might work better for them).
A crowd-sourced street name alias database would be certainly be an interesting project. It is not something I would collect in our wiki, though. Even the abbreviation page you cite doesn't scale particularly well. Wikidata might be a better much here. It is certainly possible that Nominatim makes use of such data once it is available.
2017-03-26 14:47 GMT+02:00 Sarah Hoffmann [email protected]:
A crowd-sourced street name alias database would be certainly be an interesting project. It is not something I would collect in our wiki, though. Even the abbreviation page you cite doesn't scale particularly well. Wikidata might be a better much here. It is certainly possible that Nominatim makes use of such data once it is available.
using wikidata for this sounds interesting, I am not sure what would be a suitable implementation, not even whom to ask. Maybe I give the talk mailing list a try.
There are also some other optional words (besides first names) like "del", "della", "delle", "degli", "dello", "dell'", "di", "lo", "la", "per", "dei", "da", "d'", .... and on talk-it, "Via", "Viale", "Corso", "Largo", "Vicolo", "Salita", "Fondamenta", "Piazza", "Piazzale" and others were mentioned (if no match is found, it should by tried to match the search string against a simplified version without these words, because people tend to search with shortened versions on mobile). One particularity of Italian (but for example also French, Spanish or Slavic languages) as opposed to English or German, is that the type of road (square, road, avenue, ...) is at the start rather than the end, so matching from the string beginning is working worse, because the important part in Latin languages is at the end of the string.
In short we would want to create a kind of database with
- words that are optional/less important for street name matching, consisting of 1a. first names of (generally famous) people 1b. less important words like prepositions, articles, ... 1c. titles of persons (like king, duke, earl, father, emminence, ....) 1d. street/square typology indicating words like square, road, street, avenue, path, alley, ...
for 1a and 1c it would maybe make sense to link to the actual person in wikidata? This is already done (in ~500 instances currently), https://taginfo.openstreetmap.org/tags/name%3Aetymology%3Awikidata=Q483709 https://www.wikidata.org/wiki/Q483709
This will resolve distinguishing between given names and family name (we would likely have to translate/transliterate these to Italian as well if its not an Italian?).
For 1b. and 1d. the lists will be less long in general, maybe an approach like for abbreviations is suitable?
For 1d. there are (mostly?) already wikidata objects, e.g. here: https://www.wikidata.org/wiki/Q174782 We could add the missing ones and make a list in the wiki with all these wikidata objects, so Nominatim could compile the list of name components for all the languages from wikidata? Or should we add a new property in wikidata that indicates that the word/object is used as part of streetnames to describe the kind of spatial configuration? Or should we use a new tag on every OSM object to indicate the spatial typology part in the name (street, square, ...), e.g. wikidata:space-type=Q174782 ?
About 1b. An example case: "Via Santuario Regina degli Apostoli, Roma" is currently not found, while "Via del Santuario Regina degli Apostoli, Roma" is. Both should find the same street. I have looked at wikidata to find e.g. articles, but it proved astonishingly unsuitable: https://www.wikidata.org/wiki/Q103184 (defines an article) Looking for instances of these in Italian, I only find 20 results of which 0 applicable: https://query.wikidata.org/#%23Article%20%28grammar%29%0ASELECT%20%3Fitem%20%3FitemLabel%0AWHERE%0A%7B%0A%09%3Fitem%20wdt%3AP31%20wd%3AQ103184%20.%0A%09SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22it%22%20%7D%0A%7D
I have started fixing them but after removing a handful of "instance of article(grammar)" property from objects that weren't articles (but wikipedia articles) the system didn't let me go on, saying it was for "anti abuse". --> IMHO the wikidata approach isn't suitable for 1b (or we would have to create new objects in a way that people won't confuse them, e.g. a new, very explicit class: "small words that are optional in streetnames").
I don't understand why make the problem more complicated than it is, every search engine apart from nominatim handles italian street name without problems, it's a matter of matching the search string word for word.
If the streetname in the database is composed by "word1 word2 word3", the user should be able to find it with the search string "word1 word3" or other combinations (note that permutation in the order of words at least in this case could be disregarded, it's ok if "word3 word1" doesn't match).
This, at least in my experience seems to be the standard operating mode for commercial search engines for street names. I don't understand why Nominatim has chosen a different approach, so restricive about roads search strings, that makes the user experience more frustrating.
to make things even more complicated, in Slovakia (and most probably other Slavic languages as well) Einstein street can be both ulica Alberta Einsteina
or Einsteinova ulica
(but it won't be uilca Einsteina
, ulica==street). here is probably the most complicated case.
that would mean different stop words for every language.
Regarding the alias database, at least in Spain where in some locations there are 2 similar languages. Its is quite common that some users mix languages for the title of the street and the name
So possible alias for street tiles should be:
Avenida, Av, Avda, Avenvinguda Camino, Camin, Camiño Calle, C/, C, Carrer, Rúa
Looks like this is a recurrent problem, as @gwilbor mentioned it will be solved if the search engine find word by word showing matches the more word the more matches. For example I was trying to find: https://nominatim.openstreetmap.org/details.php?place_id=70017016 Name in the database: "Calle de San Bernardino" if I have to look for it I will type just "Bernardino" or maybe "San Bernardino" of course I will left out all the auxiliary words like "calle" qualifyer of the street type and the "de" is just a preposition. I will only be interested on the "calle" if there are two streets with the same name.
this is certainly something we need to fix in order for the maps to be helpfull. I think OSM is great but for me is useless in my gps if I can not find my destination name.
Similarly, none of the examples below work:
MLK JR WAY
M L K JR WAY
M L KING JR WAY
MARTIN LUTHER KING JR WAY
You must spell it out in full to get a match:
MARTIN LUTHER KING JUNIOR WAY
I have the impression this is solved now? I find streets by searching only for "Via lastname, place", which didn't work before. This seems like big news for Italy. Thank you so much.