kenlm
kenlm copied to clipboard
lmplz --intermediate + --prune, or --renumber + --prune fails
Hi!
I wanted to prepare several models to test interpolation of them with different weights and found out that with non-zero --prune
option
lmplz -o 4 --intermediate inter --prune 1 < text
Fails with exception
kenlm/lm/common/joint_order.hh:61 in void lm::JointOrder(const util::stream::ChainPositions &, Callback &) [Callback = lm::builder::(anonymous namespace)::Callback<lm::builder::(anonymous namespace)::OutputProbBackoff>, Compare = lm::SuffixOrder] threw FormatLoadException because `order != current + 1'.
Detected n-gram without matching suffix
Abort trap: 6
at the end of === 4/4 Calculating and writing order-interpolated probabilities ===
stage.
Short debugging shown that it is because --renumber
option is automatically enabled with --intermediate
option and that it is the main reason of failure. For example
lmplz -o 4 --prune 1 --renumber < text > text.arpa
Will fail with the same exception.
Without --renumber
option arpa model is created without any problem with any --prune
. Looks like --renumber
currently only works with default --prune 0
. --interpolate inter --prune 1
version also finishes fine if you just comment-out that line
Wasn't unfortunately able to figure out main source of the problem yet, so don't have PR with fix.
If that behaviour is not intended of course :)
All that is valid for current master branch.
This issue is still present with the latest build on the master branch.
Just to add to @nosyrev's description, it happens when creating intermediate LMs with unigram pruning only, i.e.,
--intermediate test.intermediate --prune 0
and --intermediate test.intermediate --prune 0 2
work fine,
but --intermediate test.intermediate --prune 2
fails.