Nominatim icon indicating copy to clipboard operation
Nominatim copied to clipboard

Optional housenumber has greater weight than optional postcode in search

Open lonvia opened this issue 2 years ago • 5 comments

this seems to happen regularly when the correct address isn't available on house number granularity.

E.g. https://nominatim.openstreetmap.org/ui/search.html?street=Mainstra%C3%9Fe+8&city=Bamberg&country=Deutschland&postalcode=96052 (list of results with correct street on first position) https://nominatim.openstreetmap.org/ui/search.html?street=Mainstra%C3%9Fe+8&city=Bamberg&country=Deutschland&postalcode=96052&limit=1 (any Mainstraße 8 with wrong postal code and city)

after mapping the house number, this would work (verifiable with house number 10 instead of 8). I've faced some further examples with many 100km deviation.


Wouldn't it be possible to provide a query which retrieves all results from the database but the api only returns the best result (also in regard to reliability)?

Of course I could just change my code to always query for the whole list and just keep the first entry - but this seems to be a common pitfall a lot of people will discover (or they'll never discover and work with the wrong results instead ;))

Originally posted by @PhilHamm in https://github.com/osm-search/Nominatim/issues/882#issuecomment-1188776972

lonvia avatar Jul 19 '22 09:07 lonvia

The desired results doesn't show up because there are quite a few Mainstrasse 8 in the 'Landkreis Bamberg'. The initial search on search_name prioritizes these results. We probably need an ordering that produces a match score that takes all factors (is there a housenumber? How close is the postcode? How many exact matches are there?) into account and produces a single number from them.

lonvia avatar Jul 19 '22 09:07 lonvia

Thanks @lonvia for creating the ticket!

In case it helps: here's another example which shows that the first list entry could also result in a different county 125km away from the actual address.

https://nominatim.openstreetmap.org/ui/search.html?street=Heinrich-Hertz-Stra%C3%9Fe+6&city=Schleswig&country=Deutschland&postalcode=24837 https://nominatim.openstreetmap.org/ui/search.html?street=Heinrich-Hertz-Stra%C3%9Fe+6&city=Schleswig&country=Deutschland&postalcode=24837&limit=1

PhilHamm avatar Jul 19 '22 10:07 PhilHamm

I think I have a similar problem here in Neuseeland. I have hacked a solution for the search.php by modifying https://github.com/osm-search/Nominatim/blob/master/lib-php/Geocode.php#L776, changing the

break;

to

$aNextResults = $aResults;

This causes it to run through all of the groups and then (I think) do a final search on the results.

I am in progress on this investigation... including trying to work out why some of these locations get into search_names when others don't... it seems to just take one "bad" location in search_names to ruin it for partial searches that match it.

karlvr avatar Sep 04 '22 05:09 karlvr

I've progressed a bit further on understanding why Nominatim search is prioritising some strange results, and more than prioritising, only showing those results... making it seem like those are the only results for the search terms.

I hope this is actually relevant to this issue!

What I've learned (or think I've learned) is that some places / houses end up in search_name because they have some unique keywords... e.g. a house that is in a suburb that the road it's on is not in (perhaps because the road spans multiple suburbs).

What Nominatim search does is it finds that entry in search_name in one of the search groups, and decides that's enough and ends the search, even though the presence in search_name doesn't necessarily correspond to importance.

I've made the following patches to lib-php/Geocode.php to keep the search going, but to add some sort order weight to those found earlier, in case they are valid name matches:

@@ -773,7 +773,13 @@
                 }
 
+                foreach ($aResults as $iIdx => $aResult) {
+                    if (!isset($aResult->group)) {
+                        $aResult->group = $iGroupLoop;
+                    }
+                }
+                $aNextResults = $aResults;
-                if (!empty($aResults) || $iGroupLoop > 4 || $iQueryLoop > 30) {
+                if ($iGroupLoop > 4 || $iQueryLoop > 30) {
                     break;
                 }
             }
         } else {
@@ -860,6 +865,8 @@
                 }
                 // - number of exact matches from the query
                 $aResult['foundorder'] -= $aResults[$aResult['place_id']]->iExactMatches;
+                // - sort by the group in which we found the result
+                $aResult['foundorder'] += $aResults[$aResult['place_id']]->group;
                 // - importance of the class/type
                 $iClassImportance = ClassTypes\getImportance($aResult);
                 if (isset($iClassImportance)) {

There is a second problem with this... when the search is pruning the search groups it gets rid of those that will find too many matches. But that means that if it's found one or two early in the search groups, it then ends saying those are the results, even though there might actually be 100s. There's the option to increase the NOMINATIM_SEARCH_NAME_ONLY_THRESHOLD but I've tried adding a heuristic to the group pruning so that if we've pruned away a search group that looks really promising (has a house number) and haven't left any other promising ones (have house numbers) then we reject the whole search:

@@ -429,10 +429,16 @@
             }
         }
 
+        $acceptedHouseNumberSearches = 0;
+        $rejectedHouseNumberSearches = 0;
+
         // Revisit searches, drop bad searches and give penalty to unlikely combinations.
         $aGroupedSearches = array();
         foreach ($aSearches as $oSearch) {
             if (!$oSearch->isValidSearch()) {
+            if ($oSearch->hasHousenumber()) {
+                $rejectedHouseNumberSearches ++;
+            }
                 continue;
             }
 
@@ -441,9 +447,18 @@
                 $aGroupedSearches[$iRank] = array();
             }
             $aGroupedSearches[$iRank][] = $oSearch;
+
+            if ($oSearch->hasHousenumber()) {
+                $acceptedHouseNumberSearches ++;
+            }
         }
         ksort($aGroupedSearches);
 
+        // Heuristic to reject all search groups if there are possible house number specific searches that are too broad
+        if ($rejectedHouseNumberSearches > 0 && $acceptedHouseNumberSearches == 0) {
+            return array();
+        }
+
         return $aGroupedSearches;
     }
 

This has improved the results that I'm seeing for my NZ search examples. The extra results aren't bad in my examples, and are usually good.

We're running into some deficiencies of the indexing / searching I think? I'm not sure if this is helpful for anyone. I don't like running patched code so I'll definitely be available to test any improvements in this area. As you can see from the above, I'm not really equipped to contribute cleverly by myself :-)

karlvr avatar Sep 05 '22 06:09 karlvr

@karlvr 1. Please do not add comments to existing issues that may or may not be related to your problem. Open a new discussion. 2. There is no way to understand what you are going on about unless you give some concrete examples of searches that do not work together with the expected results.

I'm marking this and your comments as off-topic because of that.

lonvia avatar Sep 05 '22 07:09 lonvia