SolrTextTagger icon indicating copy to clipboard operation
SolrTextTagger copied to clipboard

Small slowdown in tagging performance after moving to the Solr 7.4 built-in tagger handler

Open simonatdrg opened this issue 7 years ago • 1 comments
trafficstars

I've moved our tagging server from a Solr 6.5.1 instance running the SolrTextTagger code on github to the built-in tagger handler in Solr 7.4.0. The metrics we collect for bulk tagging indicate that there has been a small slowdown as a result of this, of the order of .0005 second per HTTP call to the tagger from our Python tagging application. While this isn't exactly earth shaking, it does add around 45 minutes to an 11 hour index generation job (which runs overnight, admittedly).

Nothing else has changed in our framework (same hardware, Java version, Python version, tagging dictionary) and it's already been optimized like crazy to minimze the number of tagger calls required, so I'm curious as to what you might think is the cause of this ; the changes needed to port the Tagger to Solr 7.4 (you mentioned the move to FST50 postings )? possible changes to the Jetty version ? or something else.

simonatdrg avatar Oct 04 '18 14:10 simonatdrg

This is very likely the change in postings format from “Memory” to “FST50”. Memory still exists. I have ideas on how to resurrect a memory codec equivalent but no time for that.

dsmiley avatar Oct 05 '18 17:10 dsmiley