kneser-ney
kneser-ney copied to clipboard
bug in score sent
you pad the incoming sequence (https://github.com/smilli/kneser-ney/blob/master/kneser_ney.py#L147), but then go and use the original tuple (not padded) for scoring
Good point, I think the function code should be like this:
def score_sent(self, sent):
"""
Return log prob of the sentence.
Params:
sent [tuple->string] The words in the unpadded sentence.
"""
padded = (
(self.start_pad_symbol,) * (self.highest_order - 1) + sent +
(self.end_pad_symbol,))
sent_logprob = 0
for i in range(len(padded) - self.highest_order + 1):
ngram = padded[i:i + self.highest_order]
sent_logprob += self.logprob(ngram)
return sent_logprob