pytextrank
pytextrank copied to clipboard
ZeroDivisionError: division by zero in _calc_discounted_normalised_rank
Hi,
I use this library together with spacy for the extraction of the most important words. However, when using the catalan model of spacy, the algorithm gives the following error:
`File "/code/app.py", line 20, in getNlpEntities
entities = runTextRankEntities(hl, contents['contents'], algorithm, num)
File "/code/nlp/textRankEntities.py", line 51, in runTextRankEntities
doc = nlp(joined_content)
File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1022, in call
error_handler(name, proc, [doc], e)
File "/usr/local/lib/python3.9/site-packages/spacy/util.py", line 1617, in raise_error
raise e
File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1017, in call
doc = proc(doc, **component_cfg.get(name, {})) # type: ignore[call-arg]
File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 253, in call
doc._.phrases = doc._.textrank.calc_textrank()
File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 363, in calc_textrank
nc_phrases = self._collect_phrases(self.doc.noun_chunks, self.ranks)
File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 548, in _collect_phrases
return {
File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 549, in
span: self._calc_discounted_normalised_rank(span, sum_rank)
File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 592, in _calc_discounted_normalised_rank
phrase_rank = math.sqrt(sum_rank / (len(span) + non_lemma))
ZeroDivisionError: division by zero`
Hi @sumitkumarjethani, thank you for this report. Let's get it fixed!
Could you please provide:
- the code for
app.py
, or at least the body of therunTextRankEntities()
function - example data in which the exception occurs
- how spaCy and the Catalan model was installed
- versions used for spaCy, the Catalan language model
- your operating system and version
Many thanks! Paco
Yeah sure!
- Code used for execution: The original code has a quite modular structure, that's why I provide a quite similar version of the original to make it possible to run it locally (don't panic if it doesn't work as I wrote it on github itself).
""" Returns text rank entites """
def getTextRankEntities(doc):
entities = []
for phrase in doc._.phrases:
phrase_dict = {}
phrase_dict['entitie'] = phrase.text
phrase_dict['score'] = phrase.rank
phrase_dict['n_gram'] = len(phrase.text.split())
phrase_dict['count'] = phrase.count
entities.append(phrase_dict)
return entities
""" Main function to run text rank entites """
def runTextRankEntities(content):
entities = []
nlp = spacy.load("models/ca_core_news_lg-3.2.0/ca_core_news_lg/ca_core_news_lg-3.2.0") --> here you have to put the catalan pipeline name
nlp.add_pipe("textrank")
logger.info("Extracting entities with textrank algorithm")
doc = nlp(content)
entities = getTextRankEntities(doc)
logger.info("Entities extracted")
return entities
- With regard to the example data where the exception occurs, I am afraid I cannot provide it. However, you can create a string with text in catalan and pass it to the function runTextRankEntities(content).
- For the installation of spacy, the following command was executed:
pip install spacy
- For the installation of spacy catalan model I use the wget command from the repo: https://github.com/explosion/spacy-models/releases/download/ca_core_news_lg-3.2.0/ca_core_news_lg-3.2.0.tar.gz
- Spacy version: 3.2.3 | Spacy catalan language model version: 3.2.0
- OS: Windows 10 Home
Any other requirements please let me know and I will try to respond as soon as possible.
Thank you very much