berkeleylm
berkeleylm copied to clipboard
Can I feed this library raw counts instead of text files, and have it compute the Kneser Ney probabilities for me?
If we have a very large corpus that I would like to take counts of in some
distributed way, is there a way to give those raw counts to this code to build
my model for me?
Original issue reported on code.google.com by [email protected]
on 17 Jul 2013 at 7:27
The answer is "sort of". There is code in place to estimate Kneser Ney
probabilities from a Google-ngram-formatted corpus (see
https://groups.google.com/forum/#!topic/berkeleylm-discuss/G6Ta2YTsAA0).
However, there may be some bugs. But please try running it, and seeing what
happens. If it crashes, I'll have extra incentive to fix it.
Original comment by [email protected]
on 17 Jul 2013 at 8:14