placeholder
placeholder copied to clipboard
Performance Optimizations to fulltext search
The current hasSubject / _eachSynonym methods work with a single token at a time. This means for a long address string, there will be a longer list of tokens to check (lots of permutations), and hence lots of queries.
I noticed 30, 60 or even 250 fulltext search queries in some tests.
This pull request contains a new function that takes all the tokens and performs full text search in a single query. It returns matching tokens in the same format as expected, so other things don't need to change.
I saw consistent 30% speed improvements with these changes. Even bigger gains with longer searches.
Caveats:
- I had to remove the test case for the commit to go through. I'm not familiar with the testing framework, and could not write up a proper test case.
- Auto complete searches don't perform partial searches for the last token. So results may vary if using auto complete. (But I feel it's still a good compromise)
Do review and provide feedback.
I revised the matchingSubjects
function to support auto-complete, which also fixes fts5 errors with special characters in input. That code is here: https://github.com/niravmehta/placeholder/pull/4 (and not merged into this pull request #163)
Not sure why checks are failing on this pull request.
Hi @niravmehta you can view the output of the CI by clicking on the Travis-CI check itself which opens a log such as https://travis-ci.org/pelias/placeholder/jobs/609069753, if you scroll down that log you'll see the failures listed.
I'm a bit concerned about this PR, it's not good practice to delete tests because they start failing, tests are there for a reason and it's important to understand why they changed and what effect this will have on other users before we can merge this to master.
I agree removing test cases is not a good idea. But I do not know the testing framework, and couldn't figure out how to write a proper test case.
Will try again later when I have some free time.