pytextrank icon indicating copy to clipboard operation
pytextrank copied to clipboard

custom Keyword inclusion

Open Vignesh9395 opened this issue 4 years ago • 3 comments
trafficstars

Problem description

My requirement is, the generated summary should have specific keywords from the input text.

Steps/code/corpus to reproduce

I need the pipeline component to accept keywords as input parameter.

nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True, custom_keywords=keywords)

For example,

import spacy
import pytextrank

# example text
text = "Apple is red. Grape is black. Banana is yellow."

# keywords
keywords = ['apple', 'red', 'yellow']

# load a spaCy model, depending on language, scale, etc.
nlp = spacy.load("en_core_web_sm")

output = summarize(text, word_count=9, custom_keywords=keywords)

# add PyTextRank to the spaCy pipeline
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True, custom_keywords=keywords)

doc = nlp(text)

# examine the top-ranked sentences in the document
for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=2):
    print(sent)

Output

Apple is red. Banana is yellow

As in above example, I need a parameter to include custom keywords and those keywords must be present in the summarized text. (i.e) The sentences with the keywords should be the top ranked sentences.

Is there a way to do this? or any function that does this present as part of the library?

Vignesh9395 avatar Jan 12 '21 11:01 Vignesh9395

Thank you @Vignesh9395, that capability is going in with the upcoming kglab integration.

ceteri avatar Feb 15 '21 18:02 ceteri

Thank you @ceteri , looking forward!

Vignesh9395 avatar Mar 18 '21 06:03 Vignesh9395

Hi @Vignesh9395 , Can you try your use case with biased textrank. I think with an appropriate choice of focus and bias you should be able to bring such sentences on top of the summary. Please refer sample.py for the usage.

@ceteri

Ankush-Chander avatar Mar 23 '21 04:03 Ankush-Chander