Nominatim
Nominatim copied to clipboard
Optional housenumber has greater weight than optional postcode in search
this seems to happen regularly when the correct address isn't available on house number granularity.
E.g. https://nominatim.openstreetmap.org/ui/search.html?street=Mainstra%C3%9Fe+8&city=Bamberg&country=Deutschland&postalcode=96052 (list of results with correct street on first position) https://nominatim.openstreetmap.org/ui/search.html?street=Mainstra%C3%9Fe+8&city=Bamberg&country=Deutschland&postalcode=96052&limit=1 (any Mainstraße 8 with wrong postal code and city)
after mapping the house number, this would work (verifiable with house number 10 instead of 8). I've faced some further examples with many 100km deviation.
Wouldn't it be possible to provide a query which retrieves all results from the database but the api only returns the best result (also in regard to reliability)?
Of course I could just change my code to always query for the whole list and just keep the first entry - but this seems to be a common pitfall a lot of people will discover (or they'll never discover and work with the wrong results instead ;))
Originally posted by @PhilHamm in https://github.com/osm-search/Nominatim/issues/882#issuecomment-1188776972
The desired results doesn't show up because there are quite a few Mainstrasse 8 in the 'Landkreis Bamberg'. The initial search on search_name
prioritizes these results. We probably need an ordering that produces a match score that takes all factors (is there a housenumber? How close is the postcode? How many exact matches are there?) into account and produces a single number from them.
Thanks @lonvia for creating the ticket!
In case it helps: here's another example which shows that the first list entry could also result in a different county 125km away from the actual address.
https://nominatim.openstreetmap.org/ui/search.html?street=Heinrich-Hertz-Stra%C3%9Fe+6&city=Schleswig&country=Deutschland&postalcode=24837 https://nominatim.openstreetmap.org/ui/search.html?street=Heinrich-Hertz-Stra%C3%9Fe+6&city=Schleswig&country=Deutschland&postalcode=24837&limit=1
I think I have a similar problem here in Neuseeland. I have hacked a solution for the search.php
by modifying https://github.com/osm-search/Nominatim/blob/master/lib-php/Geocode.php#L776, changing the
break;
to
$aNextResults = $aResults;
This causes it to run through all of the groups and then (I think) do a final search on the results.
I am in progress on this investigation... including trying to work out why some of these locations get into search_names
when others don't... it seems to just take one "bad" location in search_names
to ruin it for partial searches that match it.
I've progressed a bit further on understanding why Nominatim search is prioritising some strange results, and more than prioritising, only showing those results... making it seem like those are the only results for the search terms.
I hope this is actually relevant to this issue!
What I've learned (or think I've learned) is that some places / houses end up in search_name
because they have some unique keywords... e.g. a house that is in a suburb that the road it's on is not in (perhaps because the road spans multiple suburbs).
What Nominatim search does is it finds that entry in search_name
in one of the search groups, and decides that's enough and ends the search, even though the presence in search_name
doesn't necessarily correspond to importance.
I've made the following patches to lib-php/Geocode.php
to keep the search going, but to add some sort order weight to those found earlier, in case they are valid name matches:
@@ -773,7 +773,13 @@
}
+ foreach ($aResults as $iIdx => $aResult) {
+ if (!isset($aResult->group)) {
+ $aResult->group = $iGroupLoop;
+ }
+ }
+ $aNextResults = $aResults;
- if (!empty($aResults) || $iGroupLoop > 4 || $iQueryLoop > 30) {
+ if ($iGroupLoop > 4 || $iQueryLoop > 30) {
break;
}
}
} else {
@@ -860,6 +865,8 @@
}
// - number of exact matches from the query
$aResult['foundorder'] -= $aResults[$aResult['place_id']]->iExactMatches;
+ // - sort by the group in which we found the result
+ $aResult['foundorder'] += $aResults[$aResult['place_id']]->group;
// - importance of the class/type
$iClassImportance = ClassTypes\getImportance($aResult);
if (isset($iClassImportance)) {
There is a second problem with this... when the search is pruning the search groups it gets rid of those that will find too many matches. But that means that if it's found one or two early in the search groups, it then ends saying those are the results, even though there might actually be 100s. There's the option to increase the NOMINATIM_SEARCH_NAME_ONLY_THRESHOLD
but I've tried adding a heuristic to the group pruning so that if we've pruned away a search group that looks really promising (has a house number) and haven't left any other promising ones (have house numbers) then we reject the whole search:
@@ -429,10 +429,16 @@
}
}
+ $acceptedHouseNumberSearches = 0;
+ $rejectedHouseNumberSearches = 0;
+
// Revisit searches, drop bad searches and give penalty to unlikely combinations.
$aGroupedSearches = array();
foreach ($aSearches as $oSearch) {
if (!$oSearch->isValidSearch()) {
+ if ($oSearch->hasHousenumber()) {
+ $rejectedHouseNumberSearches ++;
+ }
continue;
}
@@ -441,9 +447,18 @@
$aGroupedSearches[$iRank] = array();
}
$aGroupedSearches[$iRank][] = $oSearch;
+
+ if ($oSearch->hasHousenumber()) {
+ $acceptedHouseNumberSearches ++;
+ }
}
ksort($aGroupedSearches);
+ // Heuristic to reject all search groups if there are possible house number specific searches that are too broad
+ if ($rejectedHouseNumberSearches > 0 && $acceptedHouseNumberSearches == 0) {
+ return array();
+ }
+
return $aGroupedSearches;
}
This has improved the results that I'm seeing for my NZ search examples. The extra results aren't bad in my examples, and are usually good.
We're running into some deficiencies of the indexing / searching I think? I'm not sure if this is helpful for anyone. I don't like running patched code so I'll definitely be available to test any improvements in this area. As you can see from the above, I'm not really equipped to contribute cleverly by myself :-)
@karlvr 1. Please do not add comments to existing issues that may or may not be related to your problem. Open a new discussion. 2. There is no way to understand what you are going on about unless you give some concrete examples of searches that do not work together with the expected results.
I'm marking this and your comments as off-topic because of that.