blacklight icon indicating copy to clipboard operation
blacklight copied to clipboard

Suggester breaks on large corpus for Solr 7.4

Open cdmo opened this issue 6 years ago • 13 comments

We recently began experimenting with BL7, Solr 7.4, Rails 5.2 and a 7 million item catalog. In the early testing so far we've noticed that the suggester feature errors out. The Solr web UI log screen will look like this:

ERROR true
x:blacklight-core
SuggestComponent
Exception in building suggester index for: mySuggester

Looking into the logs, the full trace looks like this:

2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SpellCheckComponent Index is not optimized therefore skipping building spell check index for: default
2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SpellCheckComponent Index is not optimized therefore skipping building spell check index for: author
2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SpellCheckComponent Index is not optimized therefore skipping building spell check index for: subject
2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SpellCheckComponent Index is not optimized therefore skipping building spell check index for: title
2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SuggestComponent buildOnCommit: mySuggester
2018-09-13 19:40:12.318 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.s.s.SolrSuggester SolrSuggester.build(mySuggester)
2018-09-13 19:40:12.818 ERROR (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.h.c.SuggestComponent Exception in building suggester index for: mySuggester
java.lang.IllegalArgumentException: input automaton is too large: 1001
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1298) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]

( this goes one for 1000 lines)

        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1306) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.util.automaton.Operations.topoSortStates(Operations.java:1275) ~[lucene-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:51:45]
        at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.replaceSep(AnalyzingSuggester.java:292) ~[lucene-suggest-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:52:17]
        at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.toAutomaton(AnalyzingSuggester.java:854) ~[lucene-suggest-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:52:17]
        at org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.build(AnalyzingSuggester.java:430) ~[lucene-suggest-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:52:17]
        at org.apache.lucene.search.suggest.Lookup.build(Lookup.java:190) ~[lucene-suggest-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:52:17]
        at org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:181) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]
        at org.apache.solr.handler.component.SuggestComponent$SuggesterListener.buildSuggesterIndex(SuggestComponent.java:534) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]
        at org.apache.solr.handler.component.SuggestComponent$SuggesterListener.newSearcher(SuggestComponent.java:521) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]
        at org.apache.solr.core.SolrCore.lambda$getSearcher$18(SolrCore.java:2322) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_181]
        at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2018-09-13 19:40:12.824 INFO  (searcherExecutor-10-thread-1-processing-x:blacklight-core) [   x:blacklight-core] o.a.s.c.SolrCore [blacklight-core] Registered new searcher Searcher@b5ea2d0[blacklight-core] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_gy(7.4.0):C2138244/151:delGen=3) Uninverting(_k0(7.4.0):C2025787/58:delGen=2) Uninverting(_if(7.4.0):c125682/2:delGen=1) Uninverting(_mk(7.4.0):C2086012/4:delGen=2) Uninverting(_k8(7.4.0):C10433/38:delGen=2) Uninverting(_l1(7.4.0):C8823/2:delGen=2) Uninverting(_lj(7.4.0):C16528/1:delGen=1) Uninverting(_lk(7.4.0):C17041/1:delGen=1) Uninverting(_lx(7.4.0):C16192) Uninverting(_m0(7.4.0):C17514) Uninverting(_mw(7.4.0):c248023) Uninverting(_mc(7.4.0):C18352/1:delGen=1) Uninverting(_n7(7.4.0):c255323) Uninverting(_mx(7.4.0):C16296/3:delGen=1) Uninverting(_mu(7.4.0):C17043/2:delGen=1) Uninverting(_n0(7.4.0):C20258/2:delGen=1) Uninverting(_n5(7.4.0):C16103/1:delGen=1) Uninverting(_n6(7.4.0):C13497) Uninverting(_n3(7.4.0):C13603) Uninverting(_n8(7.4.0):C4865) Uninverting(_o8(7.4.0):c575/50:delGen=14) Uninverting(_nv(7.4.0):C9761/51:delGen=21) Uninverting(_nw(7.4.0):C5308/51:delGen=18) Uninverting(_p2(7.4.0):c1175/2:delGen=2) Uninverting(_p0(7.4.0):C1364/4:delGen=1) Uninverting(_p3(7.4.0):C97/2:delGen=1) Uninverting(_p4(7.4.0):C51/1:delGen=1) Uninverting(_p5(7.4.0):C98)))}
2018-09-13 19:40:12.825 INFO  (qtp817348612-15) [   x:blacklight-core] o.a.s.u.p.LogUpdateProcessorFactory [blacklight-core]  webapp=/solr path=/update/json params={commit=true}{commit=} 0 572

So, it appears that there is a recursive function that is only permitted to run 1,000 times before it quits. This is, apparently, a change that occurred in Lucene 7.0. Here's an email thread that discusses it:

http://lucene.472066.n3.nabble.com/solr-5-2-gt-7-2-suggester-failure-td4383551.html

Here's the relevant commit (as pointed out in the email thread):

https://github.com/apache/lucene-solr/commit/7dde798473d1a8640edafb41f28ad25d17f25a2d

I'll mention too that the suggester worked fine for us with a small corpus. It appears that once the corpus becomes too large, the suggester struggles. Also, all of our settings in schema.xml and solrconfig.xml are default.

cdmo avatar Sep 14 '18 11:09 cdmo

Is there a threshold number of documents that you are seeing this happen at?

mejackreed avatar Sep 14 '18 20:09 mejackreed

Not really. I've only tried a very small number (112) where it works fine, and then 7 million where it stops and reports an error. I could try a smaller sample for troubleshooting.

cdmo avatar Sep 17 '18 11:09 cdmo

We have experienced the same issue. And I can't recreate it locally even with the same number of documents (locally) as in our production index.

mejackreed avatar Sep 17 '18 15:09 mejackreed

🤔 I could also try a large number locally and see if the suggester works. Are you using traject in local and prod?

cdmo avatar Sep 17 '18 17:09 cdmo

This is for our application EarthWorks (GeoBlacklight). Locally we are using GeoCombine for indexing.

mejackreed avatar Sep 18 '18 08:09 mejackreed

Not a lot of progress here. But, I'll report back, locally with 2.2M records I'm seeing the same error. Same with 98K and 2K and even just 900.

Here's schema.xml fwiw

   <!-- for suggestions -->
   <copyField source="*_t" dest="suggest"/>
   <copyField source="*_facet" dest="suggest"/>

   <dynamicField name="*suggest" type="textSuggest" indexed="true" stored="false" multiValued="true" />

    <fieldType class="solr.TextField" name="textSuggest" positionIncrementGap="100">
       <analyzer>
          <tokenizer class="solr.KeywordTokenizerFactory"/>
          <filter class="solr.StandardFilterFactory"/>
          <filter class="solr.LowerCaseFilterFactory"/>
          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
    </fieldType>

I wish I could discover where the looping error was occurring, like what part of the process. I did attempt to make the suggest field jus the title_display, and that also errored out.

cdmo avatar Sep 21 '18 16:09 cdmo

Can you get the suggestions to work in Solr via, say, the admin panel? Trying to sort out whether this is a BL issue or a Solr issue.

barmintor avatar Oct 24 '18 19:10 barmintor

This is a Solr issue AFAICT

mejackreed avatar Oct 24 '18 19:10 mejackreed

Can you get the suggestions to work in Solr via, say, the admin panel? Trying to sort out whether this is a BL issue or a Solr issue.

Currently we disabled suggestions but it was not working via the admin panel as well so yes more of a Solr issue.

banukutlu avatar Oct 24 '18 20:10 banukutlu

Yep I think how it relates here to Blacklight is this:

Blacklight should ship a default solr config that works at scale with suggester things

mejackreed avatar Oct 24 '18 20:10 mejackreed

Yep, I mentioned a link to the change in Lucene 7.0 in the original description. So, definitely an upstream problem... upstream change to be fixed downstream at some point I suppose? edit: Also, I have not tested with solr 7.5, just 7.4

cdmo avatar Oct 25 '18 12:10 cdmo

This looks to me like a large field value problem, not a large index problem. If you're doing "suggest" over fields with potentially large values (like title "tokenized" as a single string -- note that KeywordTokenizer is specified in the analyzer config above), there would likely be a correlation between index size and maximum indexed field value (particularly for a field like title, with a long tail of a handful of potentially very long values) ... "if you cast a wider net, you're more likely to catch an unusual fish". Have you tested this issue with a small index that's likely to generate large *suggest field values? If it wouldn't adversely affect the usefulness of the suggestions, you could stick a LengthFilter on the end of the analysis chain and see if that helps.

magibney avatar Nov 02 '18 16:11 magibney

I can confirm @magibney answer. If the field is analyzed with string and it contains more than 200 words the above exception is thrown. If a "short" field is used with a proper analyzer, the suggester works perfectly. Refer to https://lucidworks.com/2015/03/04/solr-suggester/ for a better explanation and examples.

The confusion, on my side, was related to the examples on the Solr documentation, where the following points are not clear:

  • the "whole" field is returned, not just the matching term
  • the cat field (used in the techproducts example) is very very short field (~20chars), and NOT a free text field

thePanz avatar Feb 27 '19 17:02 thePanz