Xponents icon indicating copy to clipboard operation
Xponents copied to clipboard

Test Latest TextTagger in other languages/scripts

Open mubaldino opened this issue 6 years ago • 0 comments

Describe the bug TextTagger usage with languages other than English.

To Reproduce

  • Java or Python version: Any Java (openjdk 8 and 12)
  • Usage: Arabic text produces a "zero-length token" exception from TextTagger process()
  • Data input:
  • Did you enable logging (level = DEBUG)?
  • Other notes:
15:59:47.288 [main] ERROR org.apache.solr.handler.RequestHandlerBase - java.lang.IllegalArgumentException: term:  analyzed to a zero-length token
	at org.apache.solr.handler.tagger.Tagger.process(Tagger.java:142)
	at org.apache.solr.handler.tagger.TaggerRequestHandler.handleRequestBody(TaggerRequestHandler.java:231)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2551)
	at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:191)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
	at org.opensextant.extraction.SolrMatcherSupport.tagTextCallSolrTagger(SolrMatcherSupport.java:181)
	at org.opensextant.extractors.geo.GazetteerMatcher.tagText(GazetteerMatcher.java:444)
	at org.opensextant.extractors.geo.GazetteerMatcher.tagText(GazetteerMatcher.java:404)
	at org.opensextant.extractors.geo.PlaceGeocoder.extract(PlaceGeocoder.java:475)
	at org.opensextant.extractors.test.TestPlaceGeocoder.tagFile(TestPlaceGeocoder.java:57)
	at org.opensextant.extractors.test.TestPlaceGeocoder.main(TestPlaceGeocoder.java:164)

Expected behavior

More reasonable behavior is expected from TextTagger -- its possible the whole Solr 7.x assembly needs to be replaced with a clean setup and fully reindex data.

mubaldino avatar Jul 25 '19 20:07 mubaldino