MHTextSearch
MHTextSearch copied to clipboard
Problem with white spaces
I noticed that if I have "Game of Thrones" as part of the tagged strings in an index, and I search for "Game " (whitespace after "Game"), the results don't show the original "Game of Thrones" index. If I search for "Game" or "Thrones" it works perfectly fine. Is this expected behavior?
I'm seeing the same behaviour in my app - did you find an answer to this @computerion?
I worked around this problem by replacing whitespaces with underscores
Thanks for the quick reply! Did you replace the whitespace with underscores in your search term, the indexed strings or both?
I've implemented @computerion's workaround by replacing the whitespace with underscores in keywords and indexed strings and it works well as long as the search is identical to the underlying indexed string:
e.g:
String: "Game of Thrones" is indexed as "Game_of_thrones"
Search: "Game of" actually searches "Game_of" and so a match is returned
Which is a huge improvement on the default behaviour. However, searching for "thrones game" will return no results when most users are likely to expect this to work. Additionally this approach is likely to be more resource hungry in building and storing the index as every indexed string is effectively unique and the use of the stop word ignoring functionality is lost. Therefore I'll leave this issue open for now.
Thanks a lot for your suggestion @computerion at least I have something reasonably functional now!
Yeah it's kind of ugly but I'm glad it helped!
Hello, and sorry it took so long to reply. Have you tried trimming the search string before performing the search?
The expected capability is that searching for "thrones game" would work just as well as "game thrones"
Hi @matehat, what do you mean by trimming the string before performing the search? If you mean removing any additional whitespace at the ends of the string, there isn't any in the search term, it's just two words separated by a space.
I'm talking about NSString#stringByTrimmingCharactersInSet:
You mentioned that "Game " with a trailing space didn't work. So I asked if trimming it, so removing the space, would work.
Oh sorry, that was a different commentor. Your suggestion doesn't make a difference in my case. Indexed strings: "hello world", "hello dolly"
Search "hello" results "hello world", "hello dolly" Search "hello dolly" no results Search "hello world" no results Search "world" results "hello world" Search "world hello" no results
Per your suggestion: Search "hello " no results Search "hello " but trim it first so effectively search "hello" results "hello world", "hello dolly"
The issue is having the search query properly tokenised and having those tokens be individually taken into account during the search. At the moment it appears to behave as though only one token can be searched at a time. The workaround that @computerion suggested effectively makes every reasonably sized indexed string unique and turns multiple search keywords into one token that can match one of those unique strings if the exact phrase appears within it.