kenlm
kenlm copied to clipboard
ngrams have different scoring levels
Hi there just curious, this isn't really a bug...
How come when compared 2-grams vs 3-grams their scoring are not normalized.
The 2-grams will typically (and the majority of the time) have higher scores then the 3-grams.
This becomes problematic when trying to compare scores between 2-gram and 3-grams outputs.
Any insight would be great, perhaps with detailed explanation I can fix the issue and submit a pull.