kenlm icon indicating copy to clipboard operation
kenlm copied to clipboard

lmplz --intermediate + --prune, or --renumber + --prune fails

Open nosyrev opened this issue 4 years ago • 1 comments

Hi! I wanted to prepare several models to test interpolation of them with different weights and found out that with non-zero --prune option lmplz -o 4 --intermediate inter --prune 1 < text Fails with exception

kenlm/lm/common/joint_order.hh:61 in void lm::JointOrder(const util::stream::ChainPositions &, Callback &) [Callback = lm::builder::(anonymous namespace)::Callback<lm::builder::(anonymous namespace)::OutputProbBackoff>, Compare = lm::SuffixOrder] threw FormatLoadException because `order != current + 1'.
Detected n-gram without matching suffix
Abort trap: 6

at the end of === 4/4 Calculating and writing order-interpolated probabilities === stage.

Short debugging shown that it is because --renumber option is automatically enabled with --intermediate option and that it is the main reason of failure. For example lmplz -o 4 --prune 1 --renumber < text > text.arpa Will fail with the same exception.

Without --renumber option arpa model is created without any problem with any --prune. Looks like --renumber currently only works with default --prune 0. --interpolate inter --prune 1 version also finishes fine if you just comment-out that line

Wasn't unfortunately able to figure out main source of the problem yet, so don't have PR with fix.

If that behaviour is not intended of course :)

All that is valid for current master branch.

nosyrev avatar Oct 12 '20 16:10 nosyrev

This issue is still present with the latest build on the master branch.
Just to add to @nosyrev's description, it happens when creating intermediate LMs with unigram pruning only, i.e.,
--intermediate test.intermediate --prune 0 and --intermediate test.intermediate --prune 0 2 work fine,
but --intermediate test.intermediate --prune 2 fails.

locmene avatar Jan 02 '22 11:01 locmene