grav-plugin-tntsearch
grav-plugin-tntsearch copied to clipboard
Incomplete Results for Strings that Include Numbers
A client site uses part numbers in page titles (e.g., SPK1000) and TNTSearch isn't returning all matches when the first three characters are used.
Test case 1 is a search for "spk", which should return "spk1000" and "spk7457", but only the first appears:
A search for "spk7", returns "spk7457", which should also appear in the previous search:
Test 2 is a search for "739", which should return three results - two instances of "7393 Horn Driver" and 1 with "739" in the body of the text, but instead only returns the latter:
A search for "7393" turns up the first two expected above (two instances of "7393 Horn Driver"):
Thought this might be related to the stemming issue describe here but @ViliusS set me straight.
Are you using fuzzy search? Try playing with tntsearch settings a bit to see if one of them makes a difference. That would at least help pinpoint the part of the search algorithm which is failing.
Tried the "739" search documented above, with index rebuild + cache clear between tests, with the following:
- fuzzy enabled and then disabled, same result
- phrases enabled and then disabled, same result
- search URL ("/search") directly and using pop-up/overlay, same result
- search type of "auto" and "auto", same result
- steammer enabled and disabled, same result
Here's my settings yaml:
enabled: true
search_route: /search
query_route: /s
built_in_css: true
built_in_js: true
built_in_search_page: true
enable_admin_page_events: true
search_type: auto
fuzzy: true
phrases: true
stemmer: default
display_route: true
display_hits: true
display_time: true
live_uri_update: true
limit: '20'
min: '3'
snippet: '300'
index_page_by_default: true
scheduled_index:
enabled: false
at: '* * * * *'
logs: logs/tntsearch-index.out
filter:
items:
- [email protected]
powered_by: true
search_object_type: Grav
I can confirm that there is something strange going on with searches which include numbers. Let's day I have these 2 data points. "7777777" "777777777777"
If I search for 777 I get only the first result however it is not highlighted in the context. If I search for 7777, 77777, 777777 or 7777777 I get only first result and it is now correctly highlighted. If I search for 77777777 to 777777777777 then I correctly get only second result.
Most probably a bug in TNTSearch library itself. Maybe something with BM25 implementation.
Hmmm. Sounds like I may have to implement simple search until this gets resolved. Model numbers are the bread and butter for this client site.
I just did a quick debug session and it is definitely a library bug. ~~Most probably a bug in TNTIndexer because the data is already that way in the index. You can try to debug it youself further https://github.com/trilbymedia/grav-plugin-tntsearch/blob/develop/vendor/teamtnt/tntsearch/src/Indexer/TNTIndexer.php or~~ open an issue with them https://github.com/teamtnt/tntsearch
Great, thanks for all your help on this. I'll open an issue with https://github.com/teamtnt/tntsearch.
I have fixed some highlighter issues found during my tests https://github.com/teamtnt/tntsearch/pull/256, but search with numbers is still broken.
I will try to look at the library code in couple of days if I would find a free and will reply on tntsearch repo.
This issue can be closed here.
@thekenshow I've spent some time on this and uncovered another layer of bugs. Try these two patches: https://github.com/trilbymedia/grav-plugin-tntsearch/pull/123 https://github.com/trilbymedia/grav-plugin-tntsearch/pull/124
In order to active fuzzy search in the library itself for your case you will also need: a) apply https://github.com/teamtnt/tntsearch/pull/233 with default fuzzy_no_limit option set to true, b) and increase Levenshtein distance in the new configuration option to at least 4 or 5.
Other than that, there is nothing we can do at the moment. Proper partial search needs to be implemented by someone in the library.
@ViliusS Thanks for diving into this, I'm back to it today and will let you know what happens.