use libpostal parses for venue queries where available
We're currently not using libpostal parses for venues, if we see a venue parse we're falling back to the native parser.
I don't remember the history of this but it seems wrong to me 🤷♂
I noticed this when looking into some bug reports, one example being "Café Pelias". There are two things currently going wrong with this query:
- libpostal parses it correctly but the query fails with the message
No query to call ES with. Skipping - upon falling back to the native parser it parses it incorrectly (I'll open a separate issue for this on that repo)
So regarding the first point, I don't see why we would throw away the venue parse here from libpostal:
"parsed_text": {
"query": "Café Pelias"
}
The query label has actually been mapped from the libpostal house field in controller/libpostal, but this field indicates a venue name.
As you can see from the PR edits, we don't currently consider these venue parses for query generation and I'm not sure why, I believe that libpostal is still superior to the native parser when it comes to venue queries and has always been better than addressit was?
Thoughts?
The only reason I can think of for the existing behaviour is if libpostal erroneously identifies things as house and we were trying to guard against that?
I rebased this and put it up on dev today, it fixes the "vanity addresses" issue we've been discussing:
cc/ @blackmad
linked https://github.com/pelias/acceptance-tests/pull/533
I ran the full acceptance test suite on this today and there were actually quite a few improvements, but at the same time it highlighted some issues.
diff of changes vs. production: https://www.diffchecker.com/5Faotyih (ignore any errors related to /v1/reverse)
screenshots of some issues inherited from libpostal:
Yeah, I suspect there are two reasons why this was never implemented in the past:
- a lot of our early libpostal work was done with little concern for venues, we were really thinking mostly about addresses
- There are surely many cases where libpostal doesn't do a great job accurately detecting venues. Either false positives or false negatives would impact results in ways that are difficult to fix.
The first reason is obviously not a good one, but I imagine the hard part of actually merging this will be ensuring there aren't too many cases where, for example, something that is very much not a venue query, like one for an admin area or address, will be made worse.
Right, so the question is "which parser does a better job of venues?" and the answer is "no" 😆