inception icon indicating copy to clipboard operation
inception copied to clipboard

Error when very long annotations are indexed

Open reckart opened this issue 2 years ago • 0 comments

Bunch of info logs about the project being exported, and this error:

Exception in thread "inception-worker-218" java.lang.IllegalArgumentException: Document contains at least one immense term in field="content" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[...]...', original message: bytes can be at most 32766 in length; got 33089

And, I've seen the same error in the pod, when importing the project , but it was imported regardless of the error :

Exception in thread "inception-worker-867" java.lang.IllegalArgumentException: Document contains at least one immense term in field="content" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[...]...', original message: bytes can be at most 32766 in length; got 36224 at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:984)

Originally posted by @shgulomov in https://github.com/inception-project/inception/issues/4194#issuecomment-1725936130

reckart avatar Sep 20 '23 04:09 reckart