kenlm icon indicating copy to clipboard operation
kenlm copied to clipboard

Building diffs for patching lm binaries?

Open alexcannan opened this issue 5 years ago • 1 comments

I'm curious if it would be possible to build a storage-efficient lm.diff file to patch an older lm.binary file into a newer one. I've experimented with some existing binary diff tools and have found the lm.diff file to be roughly the size of the new lm.binary after compression, but could a smarter tool be built for the kenlm model?

alexcannan avatar Apr 16 '20 18:04 alexcannan

In theory this is possible but you'd be digging into smoothing algorithms because the discount parameters impact probability globally. And the quantizer is free to move centers. Possible but annoying.

kpu avatar Apr 22 '20 20:04 kpu