meta icon indicating copy to clipboard operation
meta copied to clipboard

language model estimation

Open smassung opened this issue 9 years ago • 2 comments

language_model needs the ability to estimate from a corpus instead of requiring a .arpa file

smassung avatar Sep 11 '15 00:09 smassung

We should consider this formulation (scroll down for the actual paper): http://homepages.inf.ed.ac.uk/s0562315/progs/#pldlm

Kenneth even mentions it in his thesis as future work. It looks like it isn't much harder than modified interpolated knesser-ney while giving better perplexity.

skystrife avatar Sep 18 '15 04:09 skystrife

Once we implement this, we should have some way of saving the tokenization setup to ensure queries using the LM are tokenized the same way.

smassung avatar Nov 20 '15 17:11 smassung