SolrTextTagger icon indicating copy to clipboard operation
SolrTextTagger copied to clipboard

Distributed Requests

Open MartinLoeper opened this issue 9 years ago • 1 comments
trafficstars

Hi,

I have a question concerning SolrCloud. Is the TaggerRequestHandler capable of performing distributed requests over multiple shards? I know that the standard solr select handler does it and can be adjusted using the shards query parameter.

Thanks, Martin

MartinLoeper avatar Sep 26 '16 10:09 MartinLoeper

The TaggerRequestHandler does not support sharded/distributed requests. I was about to write it'll never happen but I suppose I can fathom how that might work in a reasonable manner. Nonetheless, I have no plans to work on that. Despite the single shard limitation right now, the current Tagger design inherits the flexibility/configurability of Lucene/Solr. So you can put crazy amount of documents into the one shard (over a billion) and this should work. Once you get over 10's of millions, the recommendations in the instructions here will need to be modified. For example, doing optimize=true is no longer sensible, though you might want to merge to a small segment count. You might also want to tweak Solr configuration of Lucene segment merging to produce more segments. And unless you have gobs of memory at crazy high doc counts, then remove postingsFormat="memory". These things will reduce tagger speed. But there would surely be overhead in a distributed search, which this doesn't support.

Another possibility, perhaps the "poor man's sharding" would divide the tag document set into shards (perhaps by some sensible grouping if you have a taxonomy/categories) and then issue requests in parallel and then it's up to you to combine and deconflict the overlaps.

dsmiley avatar Sep 26 '16 12:09 dsmiley