berkeleylm icon indicating copy to clipboard operation
berkeleylm copied to clipboard

Calculating log probability over larger document

Open GoogleCodeExporter opened this issue 9 years ago • 0 comments

Hello,

I would like to use this LM for classification and therefore I need to 
calculate the log probability of an entire document.

One of the getLogProb() methods state:
"Calculate language model score of an n-gram. <b>Warning:</b> if you
     * pass in an n-gram of length greater than <code>getLmOrder()</code>,
     * this call will silently ignore the extra words of context. In other
     * words, if you pass in a 5-gram (<code>endPos-startPos == 5</code>) to
     * a 3-gram model, it will only score the words from <code>startPos + 2</code>
     * to <code>endPos</code>."

Is it correct to assume that the only way to get the log probability score for 
an entire document (sentence that contains more than LMOrder words) is to split 
up the document in separate n-grams and query the log probability score for 
these separately?

Original issue reported on code.google.com by [email protected] on 26 May 2015 at 7:49

GoogleCodeExporter avatar Jul 16 '15 16:07 GoogleCodeExporter