datalinguist icon indicating copy to clipboard operation
datalinguist copied to clipboard

Pipeline with KBP annotator not working

Open simongray opened this issue 4 years ago • 1 comments

Attempting to create a pipeline with the kbp annotator currently results in the following error:

Execution error (VerifyError) at edu.stanford.nlp.pipeline.ProtobufAnnotationSerializer/toProtoBuilder (ProtobufAnnotationSerializer.java:673).
Bad type on operand stack
Exception Details:
  Location:
    com/google/protobuf/GeneratedMessageV3$ExtendableMessage.getExtension(Lcom/google/protobuf/GeneratedMessage$GeneratedExtension;I)Ljava/lang/Object; @3: invokevirtual
  Reason:
    Type 'com/google/protobuf/GeneratedMessage$GeneratedExtension' (current frame, stack[1]) is not assignable to 'com/google/protobuf/ExtensionLite'
  Current Frame:
    bci: @3
    flags: { }
    locals: { 'com/google/protobuf/GeneratedMessageV3$ExtendableMessage', 'com/google/protobuf/GeneratedMessage$GeneratedExtension', integer }
    stack: { 'com/google/protobuf/GeneratedMessageV3$ExtendableMessage', 'com/google/protobuf/GeneratedMessage$GeneratedExtension', integer }
  Bytecode:
    0000000: 2a2b 1cb6 0024 b0  

I initially thought the error had something to do with a difference between the protobuf version used to compile included protobuf data with and the version that CoreNLP officially depends on (3.9.2). However, if I drop both the corenlp jar + the models jar inside a directory along with an example.txt file containing some text, annotation works fine from the command line:

# long example
java -cp "*" -Xmx16g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,coref,kbp -coref.md.type RULE -file example.txt

# shorter example (no coref)
java -cp "*" -Xmx16g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,kbp -file example.txt

simongray avatar Dec 30 '20 13:12 simongray

Adding suggested configuration doesn't work either:

:kbp        {:semgrex     "edu/stanford/nlp/models/kbp/english/semgrex"
             :tokensregex "edu/stanford/nlp/models/kbp/english/tokensregex"
             :model       "edu/stanford/nlp/models/kbp/english/tac-re-lr.ser.gz"}

Neither does :model "edu/stanford/nlp/models/kbp/english/tac-re-kbp2015.ser.gz" or setting the params to "none".

simongray avatar Dec 29 '21 08:12 simongray