Why is there a constant score for OOV?

Open ankitmundada opened this issue 7 years ago • 1 comments

This line gives a score of -1000 (which is declared here), to any n-gram which contains an OOV. Is this the right way to approach it? Isn't it possible to get the score for <unk> tokens from the LM and use that instead of using a hardcoded score?

Apr 10 '18 13:04 ankitmundada

You can get rid of the if statement here https://github.com/parlance/ctcdecode/blob/cef6739f7370762229cf7e115e4afcc319a4f805/ctcdecode/src/scorer.cpp#L83 This would assign the <UNK> probability to the OOV words.

May 25 '18 14:05 joemathai