pytextrank
pytextrank copied to clipboard
custom Keyword inclusion
Problem description
My requirement is, the generated summary should have specific keywords from the input text.
Steps/code/corpus to reproduce
I need the pipeline component to accept keywords as input parameter.
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True, custom_keywords=keywords)
For example,
import spacy
import pytextrank
# example text
text = "Apple is red. Grape is black. Banana is yellow."
# keywords
keywords = ['apple', 'red', 'yellow']
# load a spaCy model, depending on language, scale, etc.
nlp = spacy.load("en_core_web_sm")
output = summarize(text, word_count=9, custom_keywords=keywords)
# add PyTextRank to the spaCy pipeline
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True, custom_keywords=keywords)
doc = nlp(text)
# examine the top-ranked sentences in the document
for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=2):
print(sent)
Output
Apple is red. Banana is yellow
As in above example, I need a parameter to include custom keywords and those keywords must be present in the summarized text. (i.e) The sentences with the keywords should be the top ranked sentences.
Is there a way to do this? or any function that does this present as part of the library?
Thank you @Vignesh9395, that capability is going in with the upcoming kglab integration.
Thank you @ceteri , looking forward!
Hi @Vignesh9395 ,
Can you try your use case with biased textrank. I think with an appropriate choice of focus and bias you should be able to bring such sentences on top of the summary.
Please refer sample.py for the usage.
@ceteri