neuralcoref icon indicating copy to clipboard operation
neuralcoref copied to clipboard

cross sentence coreference resolution

Open Fan-Luo opened this issue 3 years ago • 1 comments

Hi,

Thank you for developing this extension. I wonder do current implementation has an option for cross sentence coreference resolution? My workaround is adding , </sent> , as sentence separator. It seems works well for many cases, but sometimes ingest the separator, so I can not map back.

For example: sents: Neil Mallon Pierce Bush (born January 22, 1955) is an American businessman and investor, , </sent> , He is the fourth of six children of former President George H, W, Bush and Barbara Bush (née Pierce), , </sent> , His five siblings are George W, Bush, the 43rd President of the United States; Jeb Bush, a former governor of Florida; Robin Bush, died of leukemia at the age of three; Marvin; and Dorothy, , </sent> , Neil Bush is currently a businessman based in Texas, numbe of sents before coreference resolution: 4

resolved_sents: Neil Mallon Pierce Bush (born January 22, 1955) is an American businessman and investor, , </sent> , Neil Mallon Pierce Bush (born January 22, 1955) is the fourth of six children of former President George H, W, Bush and Barbara Bush (née Pierce), , </sent> , Neil Mallon Pierce Bush (born January 22, 1955) five siblings are George W, Bush, the 43rd President of the United States; Jeb Bush, a former governor of Florida; Barbara Bush, died of leukemia at the age of three; Marvin; and Dorothy, , Neil Mallon Pierce Bush (born January 22, 1955) Bush is currently a businessman based in Texas, number of sents after coreference resolution: 3

I hope there is an existing solution I did not notice. If not, may I ask for your suggestion to fix my workaround?

Thank you

Fan-Luo avatar May 06 '21 09:05 Fan-Luo

Thanks, I seem to be having the same problem while linking entities and mentioned across different sentences.

import spacy
import neuralcoref
from spacy import display

nlp = spacy.load("en_core_web_lg")
neuralcoref.add_to_pipe(nlp, greedyness=0.5, max_dist=200)

text = '''Every Tuesday and Friday, Recode’s Kara Swisher and NYU Professor Scott Galloway offer sharp, unfiltered insights into the biggest stories in tech, business, and politics. They make bold predictions, pick winners and losers, and bicker and banter like no one else. Kara is out welcoming the newest member of the Pivot family! Scott is joined by co-host Stephanie Ruhle to talk about The Great Resignation, inflation, J&J’s split, and Steve Bannon’s indictment. Also, Elon is still bullying senators on Twitter, and Beto is officially running for Governor of Texas. Plus, Scott chats with Friend of Pivot, Founder and CEO of Boom Supersonic, Blake Scholl about supersonic air travel.'''

doc = nlp(text)


sentence_spans = list(doc.sents)
displacy.render(sentence_spans, style="ent")

import tabulate

rows = []
for ent in doc.ents:
    if ent.label_ != 'PERSON':
        continue
    row = [ent.text, ent.label_]
    cluster = ent._.coref_cluster
    if cluster is not None:
        row.extend([cluster.main.text, cluster.mentions])
    else:
        row.extend([None, None])
    rows.append(row)

table = tabulate.tabulate(rows, headers=["Entity", "Type", "Cluster id", "Cluster mentions"])
print(table)

I get the following output:

Entity           Type    Cluster id    Cluster mentions
---------------  ------  ------------  ------------------
Kara Swisher     PERSON
Scott Galloway   PERSON
Kara             PERSON
Pivot            PERSON
Scott            PERSON  Scott         [Scott, Scott]
Stephanie Ruhle  PERSON
Steve Bannon’s   PERSON
Elon             PERSON
Beto             PERSON
Scott            PERSON  Scott         [Scott, Scott]
Friend of Pivot  PERSON
Blake Scholl     PERSON

I was expecting entity Kara Swisher to get linked to mention Kara and Scott Galloway to Scott as they are not that farther apart.

abhinavkulkarni avatar Nov 24 '21 07:11 abhinavkulkarni