photon icon indicating copy to clipboard operation
photon copied to clipboard

add support for structured queries (opensearch only)

Open tobiass-sdl opened this issue 1 year ago • 4 comments

this adds a new function

http://localhost:2322/structured?city=berlin

Supported parameters are

"lang", "limit",  "lon", "lat", "osm_tag", "location_bias_scale", "bbox", "debug", "zoom", "layer", "countrycode", "state", "county", "city", "postcode", "district", "housenumber", "street"

Result format is equal to /api.

For those parameters that are also available for /api the meaning is the same. CountryCode has to be the ISO2 code (e.g. DE instead of "Germany"). The country code must match exactly. Fuzzy matching does not make sense for 2 letter codes. In my experience the case that address is fine except for the country code is rare compared to nonsense hits in other countries.

In order to use structured queries, the new option "-structured" has to be used when importing from Nominatim. Expect 10-20% higher file size. If you run photon on an existing index folder, photon detects automatically if the data was imported with structured query support. For data imported without "-structured" /structured?city=berlin returns a technical error as the route is not mapped. Structured queries are only supported for OpenSearch based photon.

Known issues

  • state information is used with low priority. This can cause issues with cities that exist in several states (e.g. "Springfield" in the US). Reason is that states are not normalized - some documents have abbreviations like "NY", other spell "New York" out. To fix this either an alias list is needed or this is normalized on import (maybe even for nominatim).
  • no fine tuning of the scores yet - would require a large test set with reliable expected results and some automated way for guessing "good" scores.
  • /structured?city=some_hamlet_or_isolated_dwelling does not always work

tobiass-sdl avatar Jun 06 '24 16:06 tobiass-sdl

  • [ ] adapt readme / documentation

tobiass-sdl avatar Jun 06 '24 16:06 tobiass-sdl

Sorry for creating the conflict. I found #816 while reviewing this PR. Please just rebase on master at some point. I don't mind force pushes to PR branches.

lonvia avatar Jun 11 '24 13:06 lonvia

Luckily I got rid of the bus stops without an overly complex layer filter. There was already a filter to filter out results with house numbers if the request does not contain one. Bus stops and a few other locations passed that filter because they have no house number. Therefore fixing the filter from "no house number" to "no house number and not a house" should be sufficient. I totally agree with ignoring abbrevations for know - that's similar to the issue with state="NY" vs state="New York". No fuzziness for lenient=false was a bug.

tobiass-sdl avatar Jun 24 '24 15:06 tobiass-sdl

Therefore fixing the filter from "no house number" to "no house number and not a house" should be sufficient.

You may even want to put a global filter on the query "when type is house then it must have a house number". And also add a global filter for "type != other". This should consistently filter out all object that are not "address-like".

lonvia avatar Jun 26 '24 09:06 lonvia

Looks good as a first version. We can fine-tune in follow-up PRs, if necessary.

lonvia avatar Jul 09 '24 13:07 lonvia

Oh dear, now I forgot about documentation. Do you mind adding a little bit of documentation for the new call in a separate PR?

I would suggest to do this in an extra file docs/structured.md and in the README.md just link to it (with the warning that it is optional). If the call is described directly in the main README, we will end up with repeated questions why it doesn't work on photon.komoot.io.

lonvia avatar Jul 09 '24 13:07 lonvia

Yes, I can add some documentation. I'm off next week, not sure whether I get to it before the 20th.

tobiass-sdl avatar Jul 09 '24 14:07 tobiass-sdl