kenlm icon indicating copy to clipboard operation
kenlm copied to clipboard

Easiest way to calculate p(v | context) for all v in the vocabulary using the python api?

Open arvieFrydenlund opened this issue 5 years ago • 1 comments

At every step I'd like to be able to look at the probability of all possible continuations. Do I just call model.BaseScore(state, v, state2) for all v in the vocabulary?

Also I'm confused about what state and state2 are doing for this api? Do I just keep alternating them as I move though the sentence as in this example

`accum += model.BaseScore(state, "a", state2)

accum += model.BaseScore(state2, "sentence", state)`

Thanks.

arvieFrydenlund avatar Jun 19 '20 00:06 arvieFrydenlund

There is no fast path for scoring the entire vocabulary in a given context. A forward trie is more optimal for that. KenLM implements a reverse trie to optimize individual query speed.

You can keep alternating the states as you move through the sentence. It's just an optimization to avoid a copy or object churn.

kpu avatar Jun 19 '20 11:06 kpu