lucene-addons Memory leak with wildcard inside double quotes

Running this query {!span}("*") causes a memory leak and a stop the world GC that can't be recovered from. It works fine on Solr's standard query parser.

I'll work on it today and follow up.

Mar 16 '18 14:03 sjwoodard

Exciting! Ugh. Should probably create a MatchAllDocsQuery for that like we do for *:* if we're not?

Mar 16 '18 14:03 tballison

It'll do it with an asterisk anywhere in quotes, so, "foo *" would crash it too. I know it's not really a good query, but it could still bring down a whole cluster. Do you think it should throw an exception?

update: "foo*" is OK, but a wildcard out by itself is not.

Mar 16 '18 16:03 sjwoodard

Y, this is Solr's behavior:

  // called from parser
  protected Query getWildcardQuery(String field, String termStr) throws SyntaxError {
    checkNullField(field);
    // *:* -> MatchAllDocsQuery
    if ("*".equals(termStr)) {
      if ("*".equals(field) || getExplicitField() == null) {
        return newMatchAllDocsQuery();
      }
    }

Mar 16 '18 17:03 tballison

Can you do me a favor and see if the ComplexPhraseQueryParser dies on "foo *"?

I'm happy enough converting * to a MatchAllDocsQuery when it is outside of a SpanQuery, but what should we do when inside a span? If in a SpanNear, would we just ignore it ("find foo within 2 words of anything" is the same thing as "find foo"). If in a SpanOr, should we convert that to a MatchAllDocsQuery?

Mar 16 '18 17:03 tballison

The other question is do we want to do this at the Lucene level or at the Solr level? My pref would be to do this at the Lucene level, but that goes against the decision that was made in the actual Lucene/Solr project.

Mar 16 '18 17:03 tballison

@sjwoodard, I may have some time to work on this soon. Let me know if you still care.

Jan 24 '19 21:01 tballison

I think it's a good idea to fix it because Solr can't recovery from it. I still guard against it, but I didn't know how to fix it in the code.

Jan 25 '19 15:01 sjwoodard

If we fix it in Solr, how do these tests look:

  public void testMatchAllDocs() throws Exception {
    assertJQ(req("defType", "span", "q", "*"), "/response/numFound==4");
    assertJQ(req("defType", "span", "q", "*:*"), "/response/numFound==4");
    assertJQ(req("df", "text0", "defType", "span", "q", "*:*"), "/response/numFound==4");
    assertJQ(req("df", "text0", "defType", "span", "q", "*"), "/response/numFound==3");
    assertJQ(req("df", "text0", "defType", "span", "q", "NOT *"), "/response/numFound==1");
    assertJQ(req("defType", "span", "q", "NOT *"), "/response/numFound==0");
    assertQEx("need to have a field specified in schema",
            req("defType", "span", "q", "nofield:*"),
            SolrException.ErrorCode.BAD_REQUEST);
    assertQEx("need to have a field specified in schema",
            req("df", "nofield","defType", "span", "q", "*"),
            SolrException.ErrorCode.BAD_REQUEST);
  }

The documents in the test index are specified here: https://github.com/tballison/lucene-addons/blob/master/solr-5410/src/test/java/org/tallison/solr/search/TestSpanQParserPlugin.java#L47

Jan 25 '19 16:01 tballison

My one concern is:

assertJQ(req("df", "text0", "defType", "span", "q", "*"), "/response/numFound==3");

This does return the correct documents, but it returns the wildcard query: text0:*, which could still blow out your index...unless you turn off allowLeadingWildcards

Jan 25 '19 16:01 tballison

Hi Tim,

Does this issue holds valid for the wildcard queries like following as well? I am using lucene-5205 on Solr-6.5.1. e.g

fl:"mem* leak"
fl:"[mem* leak] prob*"~3

The Solr which we are using is showing a constant rise in memory usage and the GC is very minimal and it ends up bringing down the shards.

Best, Modassar

Oct 15 '20 06:10 modassar81

lucene-addons lucene-addons copied to clipboard

Memory leak with wildcard inside double quotes

lucene-addons
lucene-addons copied to clipboard