libpostal icon indicating copy to clipboard operation
libpostal copied to clipboard

road not detected for street terms in short form (in Russia)

Open Aknilam opened this issue 4 years ago • 1 comments

Hi!

I was checking out libpostal, and saw something that could be improved.


My country is

RU


Here's how I'm using libpostal

rest api


Here's what I did - case 1

query=б-р Победы, д. 10


Here's what I got - case 1

[
  {
    "label": "house",
    "value": "б-р"
  },
  {
    "label": "road",
    "value": "победы"
  },
  {
    "label": "house_number",
    "value": "д. 10"
  }
]

Here's what I was expecting - case 1

[
  {
    "label": "road",
    "value": "б-р победы"
  },
  {
    "label": "house_number",
    "value": "д. 10"
  }
]

Here's what I did - case 2

query=б-р Солнечный 2


Here's what I got - case 2

[
  {
    "label": "house",
    "value": "б-р солнечный"
  },
  {
    "label": "house_number",
    "value": "2"
  }
]

Here's what I was expecting - case 2

[
  {
    "label": "road",
    "value": "б-р солнечный"
  },
  {
    "label": "house_number",
    "value": "2"
  }
]

Here's what I did - case 3

query=Савелкинский пр-д, д. 4


Here's what I got - case 3

[
  {
    "label": "house",
    "value": "савелкинский"
  },
  {
    "label": "road",
    "value": "пр-д д."
  },
  {
    "label": "house_number",
    "value": "4"
  }
]

Here's what I was expecting - case 3

[
  {
    "label": "road",
    "value": "савелкинский пр-д"
  },
  {
    "label": "house_number",
    "value": "д. 4"
  }
]

For parsing issues, please answer "yes" or "no" to all that apply.

yes, but don't have the city context

  • Do all the toponyms exist in OSM (city, state, region names, etc.)?

yes

  • If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result?

yes, expanding: ** б-р to бульвар ** пр-д to проезд

has helped and provided the correct results in the expected form (but only with the expanded values).

  • If the address does not contain city, region, etc., does adding those fields to the input improve the result?

no

  • If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse?

no, it doesn't contain such info


Here's what I think could be improved

add б-р as an alias/synonym to бульвар add пр-д as an alias/synonym to проезд

Aknilam avatar May 12 '20 11:05 Aknilam

Could probably be solved by changing here: https://github.com/openvenues/libpostal/blob/master/resources/dictionaries/ru/street_types.txt#L3-L4 to бульвар|бул|б-р bulvar|bul|b-r

Same for "пр-д".

xTRiM avatar Jan 21 '21 15:01 xTRiM