symspellpy
symspellpy copied to clipboard
StopIteration error when using word_segmentation
Hi, I'm trying to use symspellpy for correcting some spanish texts. I've loaded a dictionary of spanish words and their absolute frequency, and it seems to be correctly loaded. However, when I try to use the word_segmentation, the following error appears, no matter the text I introduce in it:
StopIteration Traceback (most recent call last)
~/miniconda/envs/bertology/lib/python3.7/site-packages/symspellpy/symspellpy.py in word_segmentation(self, phrase, max_edit_distance, max_segmentation_word_length, ignore_token) 1001 compositions[idx].distance_sum + separator_len + top_ed, 1002 compositions[idx].log_prob_sum + top_log_prob) -> 1003 idx = next(circular_index) 1004 return compositions[idx] 1005
StopIteration:
For making it easier, I put the full code here:
symspell.load_dictionary('CREA_total.TXT', term_index=0, count_index=1, separator='\t', encoding='latin-1') result = symspell.word_segmentation('holaadiós')
Hi, I have the exact same issue with another dictionary. Have you found any fix? Thanks
@rebouvet Hi, can you upload a sample of the dictionary which causes the error so I can try and debug?
@mammothb Same problem here using this dictionary: https://raw.githubusercontent.com/hermitdave/FrequencyWords/master/content/2018/pt_br/pt_br_full.txt
Anyone managed to solve it? I also get StopIteration error for loading french dictionary and using word_segmentation. I used this one. link
sym_spell = SymSpell(max_dictionary_edit_distance=2, count_threshold=10, prefix_length=7) dictionary_path = pkg_resources.resource_filename( "symspellpy", "fr-100k.txt") sym_spell.load_dictionary(dictionary_path) sym_spell.word_segmentation('mama mia')
Error:
/symspellpy.py in word_segmentation(self, phrase, max_edit_distance, max_segmentation_word_length, ignore_token) 1091 top_ed), 1092 compositions[idx].log_prob_sum + top_log_prob) -> 1093 idx = next(circular_index) 1094 return compositions[idx] 1095 StopIteration:
@lucaslrolim i was able to run word_segmentation without a StopIteration
error with the following code
import os.path
from symspellpy.symspellpy import SymSpell
# Set max_dictionary_edit_distance to avoid spelling correction
sym_spell = SymSpell(max_dictionary_edit_distance=0, prefix_length=7)
dictionary_path = os.path.join(
os.path.dirname(os.path.realpath(__file__)), "symspellpy", "pt_br_full.txt"
)
# term_index is the column of the term and count_index is the
# column of the term frequency
sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1, encoding="utf8")
# a sentence without any spaces
input_term = "thequickbrownfoxjumpsoverthelazydog"
result = sym_spell.word_segmentation(input_term)
print("{}, {}, {}".format(result.corrected_string, result.distance_sum,
result.log_prob_sum))
and the output is
the quick brown fox jumps overthe lazy dog, 7, -73.85138966727551
Initially, I ran into the StopIteration
error when I used the wrong path for the dictionary. Perhaps you'd like to check if you're using the correct path for the dictionary file. load_dictionary
will return False
if the dictionary file could not be found.
@vection I see you're swapping out the "frequency_dictionary_en_82_765.txt"
from the sample code with your own dictionary. However, pkg_resources
only find the dictionaries that's shipped with the symspellpy
packages. As "fr-100k.txt"
is not included in the symspellpy package, it will return an invalid path. You can construction your own path to your dictionary and pass that to load_dictionary
.
For example,
dictionary_path = "/full/path/to/fr-100k.txt"
sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1)
should work.
@alexvaca0 @rebouvet May I know if you have a similar problem with the dictionary path not pointing to the right location? Similar to what I have described in https://github.com/mammothb/symspellpy/issues/79#issuecomment-731492752