symspellpy icon indicating copy to clipboard operation
symspellpy copied to clipboard

StopIteration error when using word_segmentation

Open avacaondata opened this issue 4 years ago • 8 comments

Hi, I'm trying to use symspellpy for correcting some spanish texts. I've loaded a dictionary of spanish words and their absolute frequency, and it seems to be correctly loaded. However, when I try to use the word_segmentation, the following error appears, no matter the text I introduce in it:


StopIteration Traceback (most recent call last) in ----> 1 result = symspell.word_segmentation('holaadiós')

~/miniconda/envs/bertology/lib/python3.7/site-packages/symspellpy/symspellpy.py in word_segmentation(self, phrase, max_edit_distance, max_segmentation_word_length, ignore_token) 1001 compositions[idx].distance_sum + separator_len + top_ed, 1002 compositions[idx].log_prob_sum + top_log_prob) -> 1003 idx = next(circular_index) 1004 return compositions[idx] 1005

StopIteration:

avacaondata avatar Apr 28 '20 15:04 avacaondata

For making it easier, I put the full code here:

symspell.load_dictionary('CREA_total.TXT', term_index=0, count_index=1, separator='\t', encoding='latin-1') result = symspell.word_segmentation('holaadiós')

avacaondata avatar Apr 28 '20 15:04 avacaondata

Hi, I have the exact same issue with another dictionary. Have you found any fix? Thanks

rebouvet avatar Jul 15 '20 08:07 rebouvet

@rebouvet Hi, can you upload a sample of the dictionary which causes the error so I can try and debug?

mammothb avatar Jul 26 '20 22:07 mammothb

@mammothb Same problem here using this dictionary: https://raw.githubusercontent.com/hermitdave/FrequencyWords/master/content/2018/pt_br/pt_br_full.txt

lucaslrolim avatar Sep 15 '20 21:09 lucaslrolim

Anyone managed to solve it? I also get StopIteration error for loading french dictionary and using word_segmentation. I used this one. link sym_spell = SymSpell(max_dictionary_edit_distance=2, count_threshold=10, prefix_length=7) dictionary_path = pkg_resources.resource_filename( "symspellpy", "fr-100k.txt") sym_spell.load_dictionary(dictionary_path) sym_spell.word_segmentation('mama mia')

Error:

/symspellpy.py in word_segmentation(self, phrase, max_edit_distance, max_segmentation_word_length, ignore_token) 1091 top_ed), 1092 compositions[idx].log_prob_sum + top_log_prob) -> 1093 idx = next(circular_index) 1094 return compositions[idx] 1095 StopIteration:

vection avatar Nov 10 '20 08:11 vection

@lucaslrolim i was able to run word_segmentation without a StopIteration error with the following code

import os.path

from symspellpy.symspellpy import SymSpell

# Set max_dictionary_edit_distance to avoid spelling correction
sym_spell = SymSpell(max_dictionary_edit_distance=0, prefix_length=7)
dictionary_path = os.path.join(
    os.path.dirname(os.path.realpath(__file__)), "symspellpy", "pt_br_full.txt"
)

# term_index is the column of the term and count_index is the
# column of the term frequency
sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1, encoding="utf8")

# a sentence without any spaces
input_term = "thequickbrownfoxjumpsoverthelazydog"
result = sym_spell.word_segmentation(input_term)
print("{}, {}, {}".format(result.corrected_string, result.distance_sum,
                          result.log_prob_sum))

and the output is

the quick brown fox jumps overthe lazy dog, 7, -73.85138966727551

Initially, I ran into the StopIteration error when I used the wrong path for the dictionary. Perhaps you'd like to check if you're using the correct path for the dictionary file. load_dictionary will return False if the dictionary file could not be found.

mammothb avatar Nov 21 '20 02:11 mammothb

@vection I see you're swapping out the "frequency_dictionary_en_82_765.txt" from the sample code with your own dictionary. However, pkg_resources only find the dictionaries that's shipped with the symspellpy packages. As "fr-100k.txt" is not included in the symspellpy package, it will return an invalid path. You can construction your own path to your dictionary and pass that to load_dictionary.

For example,

dictionary_path = "/full/path/to/fr-100k.txt"

sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1)

should work.

mammothb avatar Nov 21 '20 02:11 mammothb

@alexvaca0 @rebouvet May I know if you have a similar problem with the dictionary path not pointing to the right location? Similar to what I have described in https://github.com/mammothb/symspellpy/issues/79#issuecomment-731492752

mammothb avatar Nov 21 '20 02:11 mammothb