ctcdecode
ctcdecode copied to clipboard
Why is there a constant score for OOV?
This line gives a score of -1000 (which is declared here), to any n-gram which contains an OOV. Is this the right way to approach it? Isn't it possible to get the score for <unk> tokens from the LM and use that instead of using a hardcoded score?
You can get rid of the if statement here https://github.com/parlance/ctcdecode/blob/cef6739f7370762229cf7e115e4afcc319a4f805/ctcdecode/src/scorer.cpp#L83 This would assign the <UNK> probability to the OOV words.