CoreNLP icon indicating copy to clipboard operation
CoreNLP copied to clipboard

Coreference in CoNLL output

Open andreasvc opened this issue 6 years ago • 1 comments

I'm trying to run the dcoref system on a plain text file and want to get the output in CoNLL 2012 format.

I've tried several things:

$ ./corenlp.sh -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref \
    -file /tmp/example.txt \
    -coref.conllOutputPath /tmp/example.conll

However, this option is ignored, and I get XML output.

$ ./corenlp.sh -annotators tokenize,ssplit,pos,lemma,ner,parse,dcoref \
    -file /tmp/example.txt -outputFormat conll \
    -output.columns doctitle,section,idx,word,lemma,pos,ner,headidx,deprel,link

This option is honored, but "link" does not give coreference information, and I don't see what other column I should use.

There are instructions on running the system on CoNLL 2011 data and evaluating on it, but for this use case, I don't have annotated data.

andreasvc avatar Apr 24 '19 19:04 andreasvc

I wrote a conversion script from XML to CoNLL 2012: https://gist.github.com/andreasvc/6bf9e10b2e6956ce32fb777e7efe99cb

andreasvc avatar Apr 24 '19 20:04 andreasvc