parser icon indicating copy to clipboard operation
parser copied to clipboard

Autocomplete fails for Polish addresses depending on word order

Open stanczykj opened this issue 3 months ago • 5 comments

Bug description

I've encountered an issue with Polish addresses where autocomplete results depend on the order of words in the query. This does not seem to be an issue for addresses in the US.

Steps to reproduce

  • Search with query: "wiejska 1 warsaw" (street, city) → returns results
  • Search with query: "warsaw wiejska 1" (city, street) → does not return results

Github references

No response

Environment

No response

Log output


Data references

No response

Project or organization

No response

stanczykj avatar Sep 09 '25 10:09 stanczykj

The reason why the "wiejska 1 warsaw" query works seems to be thanks to the parser not actually extracting fields:

{
    "text": "wiejska 1 warsaw",
    "parser": "pelias",
    "parsed_text": {
        "subject": "wiejska 1 warsaw"
    }
}

Query "warsaw wiejska 1" fails, because the parser attempts to extract fields (incorrectly I assume):

{
    "text": "warsaw wiejska 1",
    "parser": "pelias",
    "parsed_text": {
        "subject": "warsaw",
        "locality": "warsaw",
        "admin": "wiejska 1"
    }
}

Changing the language does not seem to have any effect.

As a workaround, is there a way to disable parsing and force the API to treat the entire query as a single subject?

stanczykj avatar Sep 09 '25 10:09 stanczykj

Afaik this is not possible without a deep dive into the code. But I'm not that familiar with the code for the api.

If street, city is the normal order in poland for addresses it is maybe worth to open an issue in https://github.com/pelias/parser to adapt the parser for polish notation.

btw: search + street, city works as expected

"text": "wiejska 1 warsaw",
      "size": 10,
      "private": false,
      "lang": {
        "name": "German",
        "iso6391": "de",
        "iso6393": "deu",
        "via": "header",
        "defaulted": false
      },
      "querySize": 20,
      "parser": "libpostal",
      "parsed_text": {
        "street": "wiejska",
        "housenumber": "1",
        "city": "warsaw"
      }

arnesetzer avatar Sep 09 '25 11:09 arnesetzer

Technically the correct order for Polish addresses is indeed street + city, but for a feature like autocomplete, it would be nice if it accepted any order since it’s user input. As for the search - interesting. In your example, it seems a different parser was used and it worked correctly. Is the parser configurable for autocomplete?

stanczykj avatar Sep 09 '25 11:09 stanczykj

According to https://github.com/pelias/documentation/blob/435103d44051755ad56858e5f98fb5c669ac4b13/services.md#libpostal libpostal is not suitable for autocomplete (eg. incomplete queries) so the team developed their own parser)

arnesetzer avatar Sep 09 '25 11:09 arnesetzer

Hi, I've transferred this issue over to the pelias/parser repo as it seems to be specific to /v1/autocomplete parsing. Although it's worth nothing that libpostal also doesn't do a great job of it on the /v1/search endpoint.

Generally speaking we expect tokens to be specified in decreasing granularity order (with the exception being that housenumber may come before or after the street name, as is common in Europe).

While it would be ideal to support this, it introduces challenges with other types of queries where the order of tokens expresses preference, I'll leave it open in case someone wants to try and tackle it, although I suspect it will be difficult.

missinglink avatar Sep 09 '25 15:09 missinglink