solr icon indicating copy to clipboard operation
solr copied to clipboard

Add `alwaysStopwords` option to `edismax` so its "all stopwords" behaviour can be controlled

Open andywebb1975 opened this issue 1 year ago • 0 comments

work in progress! If anyone sees this and has thoughts on it, please comment below :-)

Description

We were surprised by edismax's behaviour for pure-stopword queries, having expected that these would return zero results. Its 'If a query consists of all stopwords, such as "to be or not to be", then all words are required.' behaviour is the opposite to what we want as we're using query-time stopwords to prevent particular query terms matching, but there's no way to disable the behaviour other than using dismax instead, which may have other impacts.

(Using index-time stopwords breaks with mm=100% - users can't include the stop words in queries.)

Solution

This PR adds an alwaysStopwords option that disables the default behaviour. Its name is TBD.

I've noticed that the query plan becomes +() rather than MatchNoDocsQuery("") for pure-stopword queries, also when the query contains only tokenising characters (e.g. punctuation). Does this make any difference?

Tests

TO DO! - I'll also make documentation updates for edismax and the stop word filter

Checklist

Please review the following and check all that apply:

  • [ ] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • [ ] I have created a Jira issue and added the issue ID to my pull request title.
  • [ ] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • [ ] I have developed this patch against the main branch.
  • [ ] I have run ./gradlew check.
  • [ ] I have added tests for my changes.
  • [ ] I have added documentation for the Reference Guide

andywebb1975 avatar May 31 '24 07:05 andywebb1975