grav-plugin-tntsearch icon indicating copy to clipboard operation
grav-plugin-tntsearch copied to clipboard

Incomplete Results for Strings that Include Numbers

Open thekenshow opened this issue 3 years ago • 9 comments

A client site uses part numbers in page titles (e.g., SPK1000) and TNTSearch isn't returning all matches when the first three characters are used.

Test case 1 is a search for "spk", which should return "spk1000" and "spk7457", but only the first appears:

Screen Shot 2021-10-25 at 4 31 03 PM

A search for "spk7", returns "spk7457", which should also appear in the previous search:

Screen Shot 2021-10-25 at 4 31 13 PM

Test 2 is a search for "739", which should return three results - two instances of "7393 Horn Driver" and 1 with "739" in the body of the text, but instead only returns the latter:

Screen Shot 2021-10-26 at 1 16 04 PM

A search for "7393" turns up the first two expected above (two instances of "7393 Horn Driver"):

Screen Shot 2021-10-26 at 1 15 56 PM

Thought this might be related to the stemming issue describe here but @ViliusS set me straight.

thekenshow avatar Oct 26 '21 17:10 thekenshow

Are you using fuzzy search? Try playing with tntsearch settings a bit to see if one of them makes a difference. That would at least help pinpoint the part of the search algorithm which is failing.

ViliusS avatar Oct 26 '21 18:10 ViliusS

Tried the "739" search documented above, with index rebuild + cache clear between tests, with the following:

  • fuzzy enabled and then disabled, same result
  • phrases enabled and then disabled, same result
  • search URL ("/search") directly and using pop-up/overlay, same result
  • search type of "auto" and "auto", same result
  • steammer enabled and disabled, same result

Here's my settings yaml:

enabled: true
search_route: /search
query_route: /s
built_in_css: true
built_in_js: true
built_in_search_page: true
enable_admin_page_events: true
search_type: auto
fuzzy: true
phrases: true
stemmer: default
display_route: true
display_hits: true
display_time: true
live_uri_update: true
limit: '20'
min: '3'
snippet: '300'
index_page_by_default: true
scheduled_index:
  enabled: false
  at: '* * * * *'
  logs: logs/tntsearch-index.out
filter:
  items:
    - [email protected]
powered_by: true
search_object_type: Grav

thekenshow avatar Oct 28 '21 12:10 thekenshow

I can confirm that there is something strange going on with searches which include numbers. Let's day I have these 2 data points. "7777777" "777777777777"

If I search for 777 I get only the first result however it is not highlighted in the context. If I search for 7777, 77777, 777777 or 7777777 I get only first result and it is now correctly highlighted. If I search for 77777777 to 777777777777 then I correctly get only second result.

Most probably a bug in TNTSearch library itself. Maybe something with BM25 implementation.

ViliusS avatar Oct 28 '21 15:10 ViliusS

Hmmm. Sounds like I may have to implement simple search until this gets resolved. Model numbers are the bread and butter for this client site.

thekenshow avatar Oct 28 '21 15:10 thekenshow

I just did a quick debug session and it is definitely a library bug. ~~Most probably a bug in TNTIndexer because the data is already that way in the index. You can try to debug it youself further https://github.com/trilbymedia/grav-plugin-tntsearch/blob/develop/vendor/teamtnt/tntsearch/src/Indexer/TNTIndexer.php or~~ open an issue with them https://github.com/teamtnt/tntsearch

ViliusS avatar Oct 28 '21 15:10 ViliusS

Great, thanks for all your help on this. I'll open an issue with https://github.com/teamtnt/tntsearch.

thekenshow avatar Oct 28 '21 15:10 thekenshow

I have fixed some highlighter issues found during my tests https://github.com/teamtnt/tntsearch/pull/256, but search with numbers is still broken.

I will try to look at the library code in couple of days if I would find a free and will reply on tntsearch repo.

This issue can be closed here.

ViliusS avatar Oct 28 '21 23:10 ViliusS

@thekenshow I've spent some time on this and uncovered another layer of bugs. Try these two patches: https://github.com/trilbymedia/grav-plugin-tntsearch/pull/123 https://github.com/trilbymedia/grav-plugin-tntsearch/pull/124

In order to active fuzzy search in the library itself for your case you will also need: a) apply https://github.com/teamtnt/tntsearch/pull/233 with default fuzzy_no_limit option set to true, b) and increase Levenshtein distance in the new configuration option to at least 4 or 5.

Other than that, there is nothing we can do at the moment. Proper partial search needs to be implemented by someone in the library.

ViliusS avatar Oct 30 '21 08:10 ViliusS

@ViliusS Thanks for diving into this, I'm back to it today and will let you know what happens.

thekenshow avatar Nov 01 '21 15:11 thekenshow