Autocomplete fails for Polish addresses depending on word order
Bug description
I've encountered an issue with Polish addresses where autocomplete results depend on the order of words in the query. This does not seem to be an issue for addresses in the US.
Steps to reproduce
- Search with query: "wiejska 1 warsaw" (street, city) → returns results
- Search with query: "warsaw wiejska 1" (city, street) → does not return results
Github references
No response
Environment
No response
Log output
Data references
No response
Project or organization
No response
The reason why the "wiejska 1 warsaw" query works seems to be thanks to the parser not actually extracting fields:
{
"text": "wiejska 1 warsaw",
"parser": "pelias",
"parsed_text": {
"subject": "wiejska 1 warsaw"
}
}
Query "warsaw wiejska 1" fails, because the parser attempts to extract fields (incorrectly I assume):
{
"text": "warsaw wiejska 1",
"parser": "pelias",
"parsed_text": {
"subject": "warsaw",
"locality": "warsaw",
"admin": "wiejska 1"
}
}
Changing the language does not seem to have any effect.
As a workaround, is there a way to disable parsing and force the API to treat the entire query as a single subject?
Afaik this is not possible without a deep dive into the code. But I'm not that familiar with the code for the api.
If street, city is the normal order in poland for addresses it is maybe worth to open an issue in https://github.com/pelias/parser to adapt the parser for polish notation.
btw: search + street, city works as expected
"text": "wiejska 1 warsaw",
"size": 10,
"private": false,
"lang": {
"name": "German",
"iso6391": "de",
"iso6393": "deu",
"via": "header",
"defaulted": false
},
"querySize": 20,
"parser": "libpostal",
"parsed_text": {
"street": "wiejska",
"housenumber": "1",
"city": "warsaw"
}
Technically the correct order for Polish addresses is indeed street + city, but for a feature like autocomplete, it would be nice if it accepted any order since it’s user input. As for the search - interesting. In your example, it seems a different parser was used and it worked correctly. Is the parser configurable for autocomplete?
According to https://github.com/pelias/documentation/blob/435103d44051755ad56858e5f98fb5c669ac4b13/services.md#libpostal libpostal is not suitable for autocomplete (eg. incomplete queries) so the team developed their own parser)
Hi, I've transferred this issue over to the pelias/parser repo as it seems to be specific to /v1/autocomplete parsing. Although it's worth nothing that libpostal also doesn't do a great job of it on the /v1/search endpoint.
Generally speaking we expect tokens to be specified in decreasing granularity order (with the exception being that housenumber may come before or after the street name, as is common in Europe).
While it would be ideal to support this, it introduces challenges with other types of queries where the order of tokens expresses preference, I'll leave it open in case someone wants to try and tackle it, although I suspect it will be difficult.