add support for structured queries (opensearch only)
this adds a new function
http://localhost:2322/structured?city=berlin
Supported parameters are
"lang", "limit", "lon", "lat", "osm_tag", "location_bias_scale", "bbox", "debug", "zoom", "layer", "countrycode", "state", "county", "city", "postcode", "district", "housenumber", "street"
Result format is equal to /api.
For those parameters that are also available for /api the meaning is the same. CountryCode has to be the ISO2 code (e.g. DE instead of "Germany"). The country code must match exactly. Fuzzy matching does not make sense for 2 letter codes. In my experience the case that address is fine except for the country code is rare compared to nonsense hits in other countries.
In order to use structured queries, the new option "-structured" has to be used when importing from Nominatim. Expect 10-20% higher file size. If you run photon on an existing index folder, photon detects automatically if the data was imported with structured query support. For data imported without "-structured" /structured?city=berlin returns a technical error as the route is not mapped. Structured queries are only supported for OpenSearch based photon.
Known issues
- state information is used with low priority. This can cause issues with cities that exist in several states (e.g. "Springfield" in the US). Reason is that states are not normalized - some documents have abbreviations like "NY", other spell "New York" out. To fix this either an alias list is needed or this is normalized on import (maybe even for nominatim).
- no fine tuning of the scores yet - would require a large test set with reliable expected results and some automated way for guessing "good" scores.
- /structured?city=some_hamlet_or_isolated_dwelling does not always work
- [ ] adapt readme / documentation
Sorry for creating the conflict. I found #816 while reviewing this PR. Please just rebase on master at some point. I don't mind force pushes to PR branches.
Luckily I got rid of the bus stops without an overly complex layer filter. There was already a filter to filter out results with house numbers if the request does not contain one. Bus stops and a few other locations passed that filter because they have no house number. Therefore fixing the filter from "no house number" to "no house number and not a house" should be sufficient. I totally agree with ignoring abbrevations for know - that's similar to the issue with state="NY" vs state="New York". No fuzziness for lenient=false was a bug.
Therefore fixing the filter from "no house number" to "no house number and not a house" should be sufficient.
You may even want to put a global filter on the query "when type is house then it must have a house number". And also add a global filter for "type != other". This should consistently filter out all object that are not "address-like".
Looks good as a first version. We can fine-tune in follow-up PRs, if necessary.
Oh dear, now I forgot about documentation. Do you mind adding a little bit of documentation for the new call in a separate PR?
I would suggest to do this in an extra file docs/structured.md and in the README.md just link to it (with the warning that it is optional). If the call is described directly in the main README, we will end up with repeated questions why it doesn't work on photon.komoot.io.
Yes, I can add some documentation. I'm off next week, not sure whether I get to it before the 20th.