lunr.js icon indicating copy to clipboard operation
lunr.js copied to clipboard

Searching for things like "--hard" or "--help" breaks the search/returns no results

Open MikeArsenault opened this issue 5 years ago • 5 comments

There seems to be a problem regarding escaping multiple characters, in that the search does not seem to understand back to back escaped characters. For example, we know there are 6 results for --help in the handbook.

As expected, searching --help leads you to the infinite load issue, and the following console error (search):

https://d.pr/i/vDlyYe

  • Searching --help returns no queries.
  • Searching --help returns instances of -help such as slack channels with -help in the name.
  • Searching ---help returns no results.
  • Searching using html entities returns that you have not searched for anything. These include −, − and −.
  • Searching with URL encoding dashes.

We are wondering if this is by design and we just haven't determined the right escape format? Our version of lunr is 2.3.7.

MikeArsenault avatar Oct 15 '20 13:10 MikeArsenault

According to Docs in Search + or - will determine the presence and Adsense of terms

So if you search for idx.search('+') or idx.search('++any_word') it will throw error expecting term or field, found nothing so each + or - must be followed with term

osama-rizk avatar Oct 15 '20 15:10 osama-rizk

Do you have an example of the search string you are using? You mention that back to back escapes do not work, can you provide an example of how you are escaping back to back characters?

A backslash is used to escape characters that would otherwise have meaning in a query, so, for example, I would expect \-\-help to work.

If you can setup a minimal reproduction demonstrating the issue in something like jsfiddle (or similar) that'd be a great help.

olivernn avatar Nov 02 '20 18:11 olivernn

I'm experiencing issues with escaping as well. I have an example from the demo. Search for flight\-\-a and it won't find anything, although the string flight--a exists in article number 2.

gilisho avatar Feb 02 '21 10:02 gilisho

hello, I'm looking into the same issue; trying to escape a +. Escaping with \, as mentioned in the docs, does not seem to work. I think @gilisho's example demonstrates the issue well.

Instead of using Index.search I'm now trying to use Index.query. Using directly the index from @gilisho's example site, I am trying the following:

idx.search("flight")
# [Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, …] (12)

idx.search("flight--a")
# [Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, …] (12)

idx.search("flight\-\-a")
# [Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, …] (12)

I think that's because the - and \ are removed by the tokenizer:

lunr.tokenizer("flight--a")
# Array (2) = $7
# 0 {str: "flight", metadata: {position: [0, 6], index: 0}, toString: function, update: function, clone: function}
# 1 {str: "a", metadata: {position: [8, 1], index: 1}, toString: function, update: function, clone: function}

lunr.tokenizer("flight\-\-a")
# Array (2) = $7
# 0 {str: "flight", metadata: {position: [0, 6], index: 0}, toString: function, update: function, clone: function}
# 1 {str: "a", metadata: {position: [8, 1], index: 1}, toString: function, update: function, clone: function}

Using the Index.query API:

idx.query(q => q.term(lunr.tokenizer("flight--a")))
# [Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, …] (12)

idx.query(q => q.term(lunr.tokenizer("flight\-\-a")))
# [Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, …] (12)

That was expected because the tokenizer removed the part we were interested in.

But, with the snippet below, I expected I would get back some results:

idx.query(q => q.term("flight--a"))
# []

To verify that the special meaning of - is not used with the Index.query API I did

idx.search("-")
# QueryParseError: expecting term or field, found nothing

idx.search("--")
# QueryParseError: expecting term or field, found 'PRESENCE'

idx.query(q => q.term(lunr.tokenizer("-")))
# [Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, …] (12)

idx.query(q => q.term(lunr.tokenizer("--")))
# [Object, Object, Object, Object, Object, Object, Object, Object, Object, Object, …] (12)

Any hints on this @olivernn ?

c00kiemon5ter avatar Nov 25 '21 18:11 c00kiemon5ter