ValueError: math domain error
I changed the pad_symbol as left_pad_symbol, right_pad_symbol and add start_pad_symbol in KneserNeyLM, but there still another eroor. We may use log function with a negative value,but why it was negative?
code: from nltk.corpus import gutenberg from nltk.util import ngrams from kneser_ney import KneserNeyLM
gut_ngrams = ( ngram for sent in gutenberg.sents() for ngram in ngrams(sent, 3, pad_left=True, pad_right=True, right_pad_symbol='<s>',left_pad_symbol='<s>')) lm = KneserNeyLM(3, gut_ngrams,start_pad_symbol='<s>', end_pad_symbol='<s>') lm.score_sent(('This', 'is', 'a', 'sample', 'sentence', '.')) lm.generate_sentence()
ValueError Traceback (most recent call last)
ValueError: math domain error
I have the same problem. Has this problem been solved?
This error can be caused by having an input for which no single ngram appears 3 times. When this happens, the discount[2] = 2, which will lead to zero probability and break the math later.
There is already a check in the _calc_discounts function for cases where the discount goes negative, probably need another check for when they are too high.
Ultimately, its an issue of having an input dataset too small for this method to use.