spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

In-Built Coreferencing?

Open meksor opened this issue 7 years ago • 2 comments

For Coreferencing, an additional piece of software is needed right now (would be good to advertise this more, ive been using SpaCy for a year now and just found out about that, I was already thinking of using CoreNLP for this feature). Are there plans to include this in SpaCy's pipeline?

Thanks for preserving spaces by the way, SPACE IS A WORD TOO. This makes detokenization easily possible! I had to implement lots of questionable logic and make a lot of assumptions for detokenization after switching to Google Cloud NLP for Coreferencing.

PS: This (a fake news generator) is what I plan on using SpaCy for again, if anyone has any nice ideas.

meksor avatar Jun 18 '18 12:06 meksor

We might later add built-in coreferencing, but we're definitely encouraging an ecosystem of extension packages to be developed around spaCy. This allows other developers to keep credit for their work, and lets new developments avoid worrying so much about backwards compatibility and maintainability.

The core library is still focussed on stability and performance at the moment. We're especially working on our infrastructure and automation set ups, to make sure we're testing everything reliably, and that we're able to regenerate the whole model automatically, including long-running batch jobs that require processing lots of raw text.

honnibal avatar Jun 19 '18 08:06 honnibal

Looks like @honnibal forgot to actually link the coref library – here it is: https://github.com/huggingface/neuralcoref ✨

It integrates seamlessly with spaCy and comes with custom models, a pipeline component with extension attributes and training code. (It's also a great example of a spaCy plugin and how we imagine integrations of other libraries to work in the future. The team at @huggingface really did an amazing job here.)

ines avatar Jun 19 '18 08:06 ines

This should be covered by the recently released experimental coref component. You can read about the architecture in detail in the blog post and check out the sample project for details on training it or using the sample pipeline.

polm avatar Dec 06 '22 10:12 polm

This issue has been automatically closed because it was answered and there was no follow-up discussion.

github-actions[bot] avatar Dec 14 '22 00:12 github-actions[bot]

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] avatar Jan 14 '23 00:01 github-actions[bot]