libpostal
libpostal copied to clipboard
Incorrect parsing of USA state abbreviation
Hi!
I was checking out libpostal, and saw something that could be improved.
My country is
USA
Here's how I'm using libpostal
Pelias
Here's what I did
9901 MEDICAL DRIVE, MD, USA
Here's what I got
[
{
"label": "house_number",
"value": "9901"
},
{
"label": "road",
"value": "medical drive md"
},
{
"label": "city",
"value": "usa"
}
]
Here's what I was expecting
road: medical drive
state: md
For parsing issues, please answer "yes" or "no" to all that apply.
- Does the input address exist in OpenStreetMap? No
- Do all the toponyms exist in OSM (city, state, region names, etc.)? Yes https://www.openstreetmap.org/relation/162112
- If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result? N/A
- If the address does not contain city, region, etc., does adding those fields to the input improve the result?
9901 Medical Center Dr, Montgomery County, MD, USAworks correctly - If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse? N/A
Here's what I think could be improved
In this case, 'MD' should be identified as the state.
There are two strong indicators:
- There is a comma following the street name and preceding the state name
- The street name
MEDICAL DRIVEalready contains a suffix, and (correct me if I'm wrong?) it's uncommon to find an additional token following the suffix in the USA other than a directional.
Additional test case:
8000 Grey Friars Ter, PA, 18914, USA
road: "grey friars ter pa"
8000 Grey Friars Ter, 18914, PA, USA
road: "grey friars ter"