kneser-ney icon indicating copy to clipboard operation
kneser-ney copied to clipboard

bug in score sent

Open dirkweissenborn opened this issue 7 years ago • 1 comments

you pad the incoming sequence (https://github.com/smilli/kneser-ney/blob/master/kneser_ney.py#L147), but then go and use the original tuple (not padded) for scoring

dirkweissenborn avatar Jun 19 '18 20:06 dirkweissenborn

Good point, I think the function code should be like this:

def score_sent(self, sent):
        """
        Return log prob of the sentence.
        Params:
            sent [tuple->string] The words in the unpadded sentence.
        """
        padded = (
                (self.start_pad_symbol,) * (self.highest_order - 1) + sent +
                (self.end_pad_symbol,))
        sent_logprob = 0
        for i in range(len(padded) - self.highest_order + 1):
            ngram = padded[i:i + self.highest_order]
            sent_logprob += self.logprob(ngram)
        return sent_logprob

maxxbw54 avatar Jul 19 '18 15:07 maxxbw54