parser icon indicating copy to clipboard operation
parser copied to clipboard

Append unclassified tokens to the street

Open Joxit opened this issue 6 years ago • 4 comments

I created a solver that can fill the blanks (only for StreetPrefixClassification). We have some very long streets names, and this is not simple to safely match all street names. I thought that the best way to do this is to append unclassified tokens to the street (when the token is at the end of the street). Maybe it can also be used for venues.

Paris is always used as a locality, so I removed it from regions. Add cité in street_types.

Joxit avatar May 25 '19 05:05 Joxit

One thing I worry about with this is how it will affect Pelias queries generated from autocomplete input....

missinglink avatar Jun 05 '19 16:06 missinglink

I'm not sure about rewriting the span body, is this really required?

The combination of these tokens should already be present in the 'phrases' for that section.

It should be possible to find the phrase you are looking for and then classify it directly to avoid editing any of the existing spans.

missinglink avatar Jun 05 '19 21:06 missinglink

One thing I worry about with this is how it will affect Pelias queries generated from autocomplete input....

Hum, you're totally right, the last token shouldn't be appended. Rue Saint-Germains Ermon (the real locality is Ermont) should not returns Boulevard Saint-Germains Ermon as streets... It's more safe if we already have something like Rue du 8 Mai Ermont (Mai isn't in the solution).

I'm not sure about rewriting the span body, is this really required?

The combination of these tokens should already be present in the 'phrases' for that section.

It should be possible to find the phrase you are looking for and then classify it directly to avoid editing any of the existing spans.

I wanted to have your opinion on this PR. There are also something that bothers me in what I did.... I will try what you said. :smile:

Joxit avatar Jun 05 '19 21:06 Joxit

I've updated this PR.

  • I update the solution with an existing span
  • I don't fill the solution with a end-token span
  • This works only with street prefix classification

Joxit avatar Jul 15 '19 09:07 Joxit