api icon indicating copy to clipboard operation
api copied to clipboard

Parameterised fuzzy autocomplete

Open andreibalaban opened this issue 3 years ago • 8 comments

:wave: I did some awesome work for the Pelias project and would love for everyone to have a look at it and provide feedback.

Here's the reason for this change :rocket:

Working with pelias-api I discovered that a significant source of "no results" for the autocomplete endpoint are typos. A "fuzzy" autocomplete would addresses this, is supported by current elasticsearch queries and I also saw that this idea was previously explored in this PR : (https://github.com/pelias/api/pull/1268) which was my initial source of inspiration. The changes presented here support fuzziness on the current form of the autocomplete query, which includes multi-language support and a slightly different body than at the time of cited PR. Also, fuzziness and max_expansions are exposed as parameters at request time - when they are not given, the autocomplete queries are the same as before.

Here's what actually got changed :clap:

  • the query parts for complete and incomplete non-admin tokens are changed from "phrase"-based "multi_match" to a "best_fields" with configurable "fuzziness" and "max_expansions"
  • an extra boost is given for the "exact matches" for more relevant results being promoted
  • "fuzziness" and "max_expansions" are given at request time; a "prefix_length" with the value of 1 is also used.

Here's how others can test the changes :eyes:

  • separate unit tests (test/unit/query/autocomplete_fuzzy.js) and fixtures are written for the fuzzy form of the queries

andreibalaban avatar Feb 10 '21 11:02 andreibalaban

Hi @andreibalaban, This is really awesome, thanks for all the work you put in to get this going! I believe this is not the first nice contribution from the team at Bolt, so it's great to hear you're continuing to use Pelias.

I tested this out real quick and it does work pretty well overall, right?

However, as you might have discovered, adding fuzziness and max_expansions parameters to all the Elasticsearch queries, and giving them sensible values is the EASY part of fuzzy searching for a geocoder. There's plenty more work to do after that! Here at Geocode Earth we've explored it quite a bit already, and we'd love to do more of that work together with you, but I just wanted to be clear that this isn't one of those PRs that can get a quick test and be merged right away. You probably knew that 😄

I'm not going to go into a ton of detail here, because we honestly don't know all the answers ourselves, but here are some considerations for merging fuzzy matching:

API Parameters

We're generally pretty hesitant to add new parameters to the Pelias API. Once they're in, they can't be taken out, and parameters that affect performance negatively require special scrutiny. Lots of Pelias users are using it with minimal to no intermediate layer between them and end users, so to some degree the Pelias API itself needs to be "user-proof". Some things we might want to do here

  • Allow configuring max/min and default values for fuzziness in pelias.json. This would give people running Pelias a lot of control, and we can imagine a lot of people would certainly want fuzziness enabled by default.
  • Ideally, we wouldn't expose every internal parameter in the API. I see you've already chosen to pick a default value for prefix_length, which is good. We might want to do the same for max_expansions. pelias.json is a good middle ground here too

Extra non-matching results

The cool part of fuzzy matching is that it lets you return the intended result with typos. The bad part is it potentially makes you return lots of unintended results even if there are no typos. Consider a query for 18 glasgow street, kelburn, wellington. Normally it returns exactly one result, a match for that address: image

Now it returns a bunch of other addresses on the same street image

This is tricky to even measure: our current acceptance-test framework would still mark this as passing, since the correct result is indeed returned first. However, it's a bit of a regression from a user experience standpoint. At least a few brain cells have to be used to make sure the first result is indeed the right one 🧠.

We've talked about adding a "post-processing" step that somehow looks at all the results and prunes out options that are clearly bad, but that's a difficult problem. Another potential solution is to run an Elasticsearch query without fuzziness first, and then only run another with fuzziness if there were no results returned. That has its own downsides.

Complex query clauses

The autocomplete endpoint is pretty complicated under the hood as you now well know. I see you had to make a couple changes to the queries themselves to enable fuzzy matching, and we have to be careful with those. Also, it looks like not every single query clause has fuzzy matching enabled, which leads to things like pariz returning Paris just fine, but new yorp doesn't find New York. That sort of stuff is hard to explain to end users.

This isn't your fault, we really need to do some cleanup to the autocomplete endpoint, hopefully making it simpler to work with, and probably improving the results along the way. This is really hard work, and there's probably a happy balance where fuzzy matching works in most cases, but there's still more work to be done.

Performance and response time

Stating another obvious thing, autocomplete has to return results really fast 💨. While adding fuzzy matching capability is definitely worth some response time, we'd want to test this out quite a bit to make sure its worth it. In our experience synthetic loadtesting can help, but the real answer comes from testing live traffic at scale. That's an extra challenge for something like this that changes the responses and parameters, but it's not insurmountable. You probably have some capabilities to do this, as we certainly do.

Next steps

So, I'm really happy to see this and push it forward over the next few months. We've been dragging our feet on it a bit, since it's going to be a challenge, but working with others who are motivated helps there 🧑‍🤝‍🧑. Feel free to follow up here with more thoughts and comments. Also we should probably set up some sort of call to figure out a path forward. [email protected] is always open :)

Thanks again!

orangejulius avatar Feb 10 '21 18:02 orangejulius

Hello @orangejulius,

I'm really glad that our initiative for integrating fuzziness into the current version of autocomplete is well received and you are open to working with us on this. Also, thank you for taking the time to present here valuable insight regarding this - some things are easier to tackle like you also said (your suggestions in the API Parameters section for example), others are more tricky. It is a challenge to do this right but we are confident that working together is the key to accomplish this and we are happy to contribute.

I wrote on '[email protected]' to schedule a call with you guys.

Cheers !

andreibalaban avatar Feb 11 '21 10:02 andreibalaban

Hello @orangejulius,

It was great meeting you last week and we really appreciated the exchange of ideas. From what I remember, the plan was to enhance this PR with some config-related changes. Can we help you with anything regarding this?

Thank you!

andreibalaban avatar Feb 25 '21 10:02 andreibalaban

@orangejulius any way we can move forward with this? Would be nice to merge the logic upstream.

mihneadb avatar Jul 08 '21 13:07 mihneadb

Hi @andreibalaban and @mihneadb,

Sorry for the delay here! I finally got around to really taking a look at the impact of this PR by setting up a branch with some reasonable default values for the fuzzy search parameters and running it through our acceptance test suite.

While there are a lot of cases where fuzzy autocomplete helps (obviously), I think there will be more tuning work to do before we can merge this into Pelias.

Impact on non-typo queries

The main issue I saw is that there are a decent number of cases where the "correct" results for a query with no typos no longer comes up first. For example, take this query for Paris: image

Pakistan coming up first is a bit of a dealbreaker here. A rough estimate from all our acceptance tests is that this happens around 3-5% of the time. By no means do we need to wait until there are no regressions before merging fuzzy autocomplete, but from my non-scientific testing we aren't there yet.

Postal codes

Postal codes are particularly impacted. Since most numeric substitutions for any valid postalcode will also result in a valid postalcode, and all postal codes will generally have the same score from Elasticsearch, the odds of the "right" one coming up first as it stands now is low. In this case the expected postalcode comes up 10th, so it barely even makes it into the results at all.

image

Next steps

I think the path forward here probably entails a bit of work: we might want to add logic to detect cases (like all numeric inputs for postalcodes), where we don't do any fuzzy matching.

We also might have to add new query clauses that give a slight boost to exact matches, to solve issues like the Paris query above. If you have any other ideas or have techniques that have worked for you, let me know.

This is an important feature so we do plan to keep working on it. It's not the main thing we're working on today, but we are definitely not going to stop working on it just because it's hard :)

orangejulius avatar Jul 08 '21 15:07 orangejulius

Oops, quick follow up here.

I just realized that this PR is supposed to include logic for boosting exact matches. As outlined in my examples above, that's definitely important.

I'm not 100% sure my branch to test fuzzy autocomplete is correctly enabling that new logic though. Let me know if it looks broken to you. It might also be that the new exact match logic isn't quite working right, in which case by fixing it we can improve result quality quite a bit.

Also let me know if you need any help running our acceptance tests to do your own testing. Both the regular acceptance tests and some of the test suites from our fuzzy tests, in particular the city search suite are useful here.

It can be a bit of an art to interpret the acceptance test results though, so let us know if we need to lend a hand.

orangejulius avatar Jul 08 '21 17:07 orangejulius

Hello @orangejulius,

Thank you for continuing work on this and sharing insight about the problems that you discovered.

I ran a quick test using your branch (but using our data) and I managed to reproduce the postal code issue but not the one where Pakistan appears first while looking for Paris. The way we use Pelias kind of prevents us from experiencing the fuzzy autocomplete failures that you experienced, some differences would be that we have our own popularity model, we search only within a radius of a focus point and we only query venues and addresses. Also, we don't build data on top of a full-planet OSM base. Having the mentioned popularity model for example means that we are willing to tolerate an un-exact match appearing first if it is a more popular one.

That being said we are willing to make an effort to make this work for a more "general-purpose" use-case, hopefully in a few weeks. Maybe what needs to be done is to invest more into reformulating the "exact-match" clause (which right now I think takes only complete tokens into account, and could be improved) and parametrise somehow the balance between fuzzy+popularity driven matches and exact match oriented results (maybe with a default on exact match oriented results to suit a general purpose use). Also, do you have an easy, shareable way that we can test the code over a "full-planet" build ? Maybe a setup that you can share ? That would be of real help when we get back to this and would ensure that we are on the same page.

Thank you!

andreibalaban avatar Jul 20 '21 14:07 andreibalaban

Hello @orangejulius,

I reformulated the "exact match boost" clauses in my latest commit The parts of the query from the "must" clause that were "fuzzified" are now placed in original, exact-match form in the "should" clause - so "Paris vs Pakistan" and "90210" cases should not happen on this version. Please run a round of checks on the full-planet build and let us know how this new version works; we can also iterate on it. I also run the acceptance-tests and the fuzzy-tests suite but on our dataset the suites had a high error rate, as we do not have access to a full planet build. Like I previously mentioned, if there would be an easy way for us to have access to the same data that you use it would speed up things. Nevertheless, hope this iteration is an improvement - let us know :D

Thank you!

andreibalaban avatar Aug 25 '21 11:08 andreibalaban