MHTextSearch icon indicating copy to clipboard operation
MHTextSearch copied to clipboard

Problem with white spaces

Open hudaniel opened this issue 9 years ago • 9 comments

I noticed that if I have "Game of Thrones" as part of the tagged strings in an index, and I search for "Game " (whitespace after "Game"), the results don't show the original "Game of Thrones" index. If I search for "Game" or "Thrones" it works perfectly fine. Is this expected behavior?

hudaniel avatar Jun 01 '16 20:06 hudaniel

I'm seeing the same behaviour in my app - did you find an answer to this @computerion?

mikecsh avatar Oct 09 '16 07:10 mikecsh

I worked around this problem by replacing whitespaces with underscores

hudaniel avatar Oct 09 '16 07:10 hudaniel

Thanks for the quick reply! Did you replace the whitespace with underscores in your search term, the indexed strings or both?

mikecsh avatar Oct 09 '16 08:10 mikecsh

I've implemented @computerion's workaround by replacing the whitespace with underscores in keywords and indexed strings and it works well as long as the search is identical to the underlying indexed string:

e.g:

String: "Game of Thrones" is indexed as "Game_of_thrones"

Search: "Game of" actually searches "Game_of" and so a match is returned

Which is a huge improvement on the default behaviour. However, searching for "thrones game" will return no results when most users are likely to expect this to work. Additionally this approach is likely to be more resource hungry in building and storing the index as every indexed string is effectively unique and the use of the stop word ignoring functionality is lost. Therefore I'll leave this issue open for now.

Thanks a lot for your suggestion @computerion at least I have something reasonably functional now!

mikecsh avatar Oct 09 '16 08:10 mikecsh

Yeah it's kind of ugly but I'm glad it helped!

hudaniel avatar Oct 09 '16 08:10 hudaniel

Hello, and sorry it took so long to reply. Have you tried trimming the search string before performing the search?

The expected capability is that searching for "thrones game" would work just as well as "game thrones"

matehat avatar Oct 09 '16 14:10 matehat

Hi @matehat, what do you mean by trimming the string before performing the search? If you mean removing any additional whitespace at the ends of the string, there isn't any in the search term, it's just two words separated by a space.

mikecsh avatar Oct 09 '16 15:10 mikecsh

I'm talking about NSString#stringByTrimmingCharactersInSet:

You mentioned that "Game " with a trailing space didn't work. So I asked if trimming it, so removing the space, would work.

matehat avatar Oct 09 '16 21:10 matehat

Oh sorry, that was a different commentor. Your suggestion doesn't make a difference in my case. Indexed strings: "hello world", "hello dolly"

Search "hello" results "hello world", "hello dolly" Search "hello dolly" no results Search "hello world" no results Search "world" results "hello world" Search "world hello" no results

Per your suggestion: Search "hello " no results Search "hello " but trim it first so effectively search "hello" results "hello world", "hello dolly"

The issue is having the search query properly tokenised and having those tokens be individually taken into account during the search. At the moment it appears to behave as though only one token can be searched at a time. The workaround that @computerion suggested effectively makes every reasonably sized indexed string unique and turns multiple search keywords into one token that can match one of those unique strings if the exact phrase appears within it.

mikecsh avatar Oct 09 '16 21:10 mikecsh