libpostal icon indicating copy to clipboard operation
libpostal copied to clipboard

Incorrect parsing of USA state abbreviation

Open missinglink opened this issue 6 years ago • 1 comments

Hi!

I was checking out libpostal, and saw something that could be improved.


My country is

USA


Here's how I'm using libpostal

Pelias


Here's what I did

9901 MEDICAL DRIVE, MD, USA


Here's what I got

[
  {
    "label": "house_number",
    "value": "9901"
  },
  {
    "label": "road",
    "value": "medical drive md"
  },
  {
    "label": "city",
    "value": "usa"
  }
]

Here's what I was expecting

road: medical drive
state: md

For parsing issues, please answer "yes" or "no" to all that apply.

  • Does the input address exist in OpenStreetMap? No
  • Do all the toponyms exist in OSM (city, state, region names, etc.)? Yes https://www.openstreetmap.org/relation/162112
  • If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result? N/A
  • If the address does not contain city, region, etc., does adding those fields to the input improve the result? 9901 Medical Center Dr, Montgomery County, MD, USA works correctly
  • If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse? N/A

Here's what I think could be improved

In this case, 'MD' should be identified as the state.

There are two strong indicators:

  1. There is a comma following the street name and preceding the state name
  2. The street name MEDICAL DRIVE already contains a suffix, and (correct me if I'm wrong?) it's uncommon to find an additional token following the suffix in the USA other than a directional.

missinglink avatar Sep 11 '19 10:09 missinglink

Additional test case:

8000 Grey Friars Ter, PA, 18914, USA

road: "grey friars ter pa"
8000 Grey Friars Ter, 18914, PA, USA

road: "grey friars ter"

missinglink avatar Jun 16 '20 13:06 missinglink