Append unclassified tokens to the street
I created a solver that can fill the blanks (only for StreetPrefixClassification). We have some very long streets names, and this is not simple to safely match all street names. I thought that the best way to do this is to append unclassified tokens to the street (when the token is at the end of the street). Maybe it can also be used for venues.
Paris is always used as a locality, so I removed it from regions. Add cité in street_types.
One thing I worry about with this is how it will affect Pelias queries generated from autocomplete input....
I'm not sure about rewriting the span body, is this really required?
The combination of these tokens should already be present in the 'phrases' for that section.
It should be possible to find the phrase you are looking for and then classify it directly to avoid editing any of the existing spans.
One thing I worry about with this is how it will affect Pelias queries generated from autocomplete input....
Hum, you're totally right, the last token shouldn't be appended.
Rue Saint-Germains Ermon (the real locality is Ermont) should not returns Boulevard Saint-Germains Ermon as streets... It's more safe if we already have something like Rue du 8 Mai Ermont (Mai isn't in the solution).
I'm not sure about rewriting the span body, is this really required?
The combination of these tokens should already be present in the 'phrases' for that section.
It should be possible to find the phrase you are looking for and then classify it directly to avoid editing any of the existing spans.
I wanted to have your opinion on this PR. There are also something that bothers me in what I did.... I will try what you said. :smile:
I've updated this PR.
- I update the solution with an existing span
- I don't fill the solution with a end-token span
- This works only with street prefix classification