pytextrank ZeroDivisionError: division by zero in _calc_discounted_normalised

ZeroDivisionError: division by zero in _calc_discounted_normalised_rank

Open sumitkumarjethani opened this issue 2 years ago • 2 comments

Hi,

I use this library together with spacy for the extraction of the most important words. However, when using the catalan model of spacy, the algorithm gives the following error:

`File "/code/app.py", line 20, in getNlpEntities

entities = runTextRankEntities(hl, contents['contents'], algorithm, num)

File "/code/nlp/textRankEntities.py", line 51, in runTextRankEntities

doc = nlp(joined_content)

File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1022, in call

error_handler(name, proc, [doc], e)

File "/usr/local/lib/python3.9/site-packages/spacy/util.py", line 1617, in raise_error

raise e

File "/usr/local/lib/python3.9/site-packages/spacy/language.py", line 1017, in call

doc = proc(doc, **component_cfg.get(name, {}))  # type: ignore[call-arg]

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 253, in call

doc._.phrases = doc._.textrank.calc_textrank()

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 363, in calc_textrank

nc_phrases = self._collect_phrases(self.doc.noun_chunks, self.ranks)

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 548, in _collect_phrases

return {

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 549, in

span: self._calc_discounted_normalised_rank(span, sum_rank)

File "/usr/local/lib/python3.9/site-packages/pytextrank/base.py", line 592, in _calc_discounted_normalised_rank

phrase_rank = math.sqrt(sum_rank / (len(span) + non_lemma))

ZeroDivisionError: division by zero`

Apr 11 '22 11:04 sumitkumarjethani

Hi @sumitkumarjethani, thank you for this report. Let's get it fixed!

Could you please provide:

the code for app.py, or at least the body of the runTextRankEntities() function
example data in which the exception occurs
how spaCy and the Catalan model was installed
versions used for spaCy, the Catalan language model
your operating system and version

Many thanks! Paco

Apr 11 '22 16:04 ceteri

Yeah sure!

Code used for execution: The original code has a quite modular structure, that's why I provide a quite similar version of the original to make it possible to run it locally (don't panic if it doesn't work as I wrote it on github itself).

""" Returns text rank entites """

def getTextRankEntities(doc):

entities = []

for phrase in doc._.phrases:
    phrase_dict = {}

    phrase_dict['entitie'] = phrase.text
    phrase_dict['score'] = phrase.rank
    phrase_dict['n_gram'] = len(phrase.text.split())
    phrase_dict['count'] = phrase.count

    entities.append(phrase_dict)
return entities

""" Main function to run text rank entites """

def runTextRankEntities(content):

entities = []

nlp = spacy.load("models/ca_core_news_lg-3.2.0/ca_core_news_lg/ca_core_news_lg-3.2.0") --> here you have to put the catalan pipeline name
nlp.add_pipe("textrank")

logger.info("Extracting entities with textrank algorithm")
doc = nlp(content)
entities = getTextRankEntities(doc)
logger.info("Entities extracted")
return entities

With regard to the example data where the exception occurs, I am afraid I cannot provide it. However, you can create a string with text in catalan and pass it to the function runTextRankEntities(content).
For the installation of spacy, the following command was executed: pip install spacy
For the installation of spacy catalan model I use the wget command from the repo: https://github.com/explosion/spacy-models/releases/download/ca_core_news_lg-3.2.0/ca_core_news_lg-3.2.0.tar.gz
Spacy version: 3.2.3 | Spacy catalan language model version: 3.2.0
OS: Windows 10 Home

Any other requirements please let me know and I will try to respond as soon as possible.

Thank you very much

Apr 12 '22 12:04 sumitkumarjethani

pytextrank pytextrank copied to clipboard

ZeroDivisionError: division by zero in _calc_discounted_normalised_rank

pytextrank
pytextrank copied to clipboard