schema icon indicating copy to clipboard operation
schema copied to clipboard

Change to make compound ampersand words easier to match

Open blackmad opened this issue 4 years ago • 6 comments

I think this is the correct change to make it so "A&P Deli" can be matched by any of these queries: "A&P" "A & P" and "A and P"

I alias "and" to "und" because otherwise it seems like this change wouldn't work for german venues.

But then again I am still not great at schema changes. Working on building an index with this locally now. Unittests & manual testing seem to tell me this change works.

blackmad avatar Sep 03 '20 15:09 blackmad

oh right, integration tests.

blackmad avatar Sep 03 '20 15:09 blackmad

Kicked off a build for this, will update with some results.

Personally, my biggest potential concern is that there will be lots of new short token matches, which could impact either query precision or response time. But we'll only know for sure after the build :)

orangejulius avatar Sep 10 '20 15:09 orangejulius

Okay so, I reviewed our test results ran against this build the other day, and as far as I can tell it's just noise from having compared two builds run on slightly different days/data.

@blackmad do you have any good examples of pelias compare links that show cases where we aren't doing well with ampersands currently? would be great to try to find a test case in open data.

I'm going to do a little examining to see how many records this would affect (I suspect very few), and unless that investigation suggests that there might realistically be a performance impact to adding more smaller tokens to the index, this should be good to go.

orangejulius avatar Sep 24 '20 14:09 orangejulius

on dev, results in nyc: https://pelias.github.io/compare/#/v1/search?sources=oa%2Cosm&focus.point.lat=40.74&focus.point.lon=-74&text=h%26m

but when I switch it to "h and m" I lose all those results - this change should theoretically fix this? https://pelias.github.io/compare/#/v1/search?sources=oa%2Cosm&focus.point.lat=40.74&focus.point.lon=-74&text=h+and+m

blackmad avatar Sep 24 '20 15:09 blackmad

Okay, I've set up some simple, exploratory acceptance tests in https://github.com/pelias/acceptance-tests/pull/534 for this.

It looks like this PR causes a mix of improvements and regressions (baseline on left, this branch on the right): Screenshot_2020-09-24_17-08-41

Improvements

  • H & M in /v1/search
  • H & M in /v1/autocomplete

Regressions

  • H&M in /v1/autocomplete

orangejulius avatar Sep 24 '20 21:09 orangejulius

follow-up: try adding regex pattern for ampersand at end to fix query autocomplete normalization

blackmad avatar Sep 29 '20 14:09 blackmad