hello-nlp icon indicating copy to clipboard operation
hello-nlp copied to clipboard

Coreference resolution plugin

Open binarymax opened this issue 5 years ago • 1 comments

Coreference resolution should be added as a pre-tokenization step. This will improve the knowledge graph extraction recall, and also improve BM25 accuracy.

Coref poses several challenges, most importantly: accuracy can be low, and a performance hit will be incurred.

Candidates for the step include the neuralcoref library (https://github.com/huggingface/neuralcoref), and the BERT based coref library (https://github.com/mandarjoshi90/coref). The former offers easy integration with spaCy, but has a lower accuracy than the latter. The latter offers higher accuracy but probably needs a GPU for reasonable performance, and is finicky to get working (the example colab notebook doesn't work out of the box).

binarymax avatar Oct 01 '20 11:10 binarymax

This is on hold until the upgrade to spaCy 3.0

binarymax avatar Dec 17 '20 15:12 binarymax