datalinguist
datalinguist copied to clipboard
Pipeline with KBP annotator not working
Attempting to create a pipeline with the kbp annotator currently results in the following error:
Execution error (VerifyError) at edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer/toProtoBuilder (ProtobufAnnotationSerializer.java:673).
Bad type on operand stack
Exception Details:
Location:
com/google/protobuf/GeneratedMessageV3$ExtendableMessage.getExtension(Lcom/google/protobuf/GeneratedMessage$GeneratedExtension;I)Ljava/lang/Object; @3: invokevirtual
Reason:
Type 'com/google/protobuf/GeneratedMessage$GeneratedExtension' (current frame, stack[1]) is not assignable to 'com/google/protobuf/ExtensionLite'
Current Frame:
bci: @3
flags: { }
locals: { 'com/google/protobuf/GeneratedMessageV3$ExtendableMessage', 'com/google/protobuf/GeneratedMessage$GeneratedExtension', integer }
stack: { 'com/google/protobuf/GeneratedMessageV3$ExtendableMessage', 'com/google/protobuf/GeneratedMessage$GeneratedExtension', integer }
Bytecode:
0000000: 2a2b 1cb6 0024 b0
I initially thought the error had something to do with a difference between the protobuf version used to compile included protobuf data with and the version that CoreNLP officially depends on (3.9.2). However, if I drop both the corenlp jar + the models jar inside a directory along with an example.txt file containing some text, annotation works fine from the command line:
# long example
java -cp "*" -Xmx16g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,coref,kbp -coref.md.type RULE -file example.txt
# shorter example (no coref)
java -cp "*" -Xmx16g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,kbp -file example.txt
Adding suggested configuration doesn't work either:
:kbp {:semgrex "edu/stanford/nlp/models/kbp/english/semgrex"
:tokensregex "edu/stanford/nlp/models/kbp/english/tokensregex"
:model "edu/stanford/nlp/models/kbp/english/tac-re-lr.ser.gz"}
Neither does :model "edu/stanford/nlp/models/kbp/english/tac-re-kbp2015.ser.gz" or setting the params to "none".