wav2letter icon indicating copy to clipboard operation
wav2letter copied to clipboard

Something confusing me in lexiconDecoder for ctc model

Open qzfnihao opened this issue 5 years ago • 1 comments
trafficstars

when i try to debug ctc model in decoder, in recipes/streaming_convnets, I find something confused. i found the beam search algorithm in lexicon decoder, not prefix beam search for ctc. in ctc, we know that, if we get a sequence output such as "hhee-l-lo", it should output "hello", here, i use "-" to represent blank. but in beam search , we just treat "hh" as 2 sequential token, and search for first 'h' get state 1, and search for second 'h' from state 1 and get state 2. if there exist one word "hh", then we may get a word "hh". But we know that, in ctc, this result is not legal. This means we may output bad result using LexiconDecoder for ctc.

//we eat-up a new token ---------------------- In my understanding, this can prevent "hh" in token search 88 if (opt_.criterionType != CriterionType::CTC || prevHyp.prevBlank || 89 n != prevIdx) { 90 if (!lex->children.empty()) { 91 if (!isLmToken_) { 92 lmState = prevHyp.lmState; 93 lmScore = lex->maxScore - lexMaxScore; 94 } 95 candidatesAdd( 96 candidates_, 97 candidatesBestScore_, 98 opt_.beamThreshold, 99 score + opt_.lmWeight * lmScore, 100 lmState, 101 lex.get(), 102 &prevHyp, 103 n, 104 -1, 105 false, // prevBlank 106 prevHyp.amScore + amScore, 107 prevHyp.lmScore + lmScore); 108 } 109 } 110 111 // If we got a true word ---- but here, if the new stat is a word, we will add it. this means we will add "hh".
112 for (auto label : lex->labels) { 113 if (!isLmToken_) { 114 auto lmStateScorePair = lm_->score(prevHyp.lmState, label); 115 lmState = lmStateScorePair.first; 116 lmScore = lmStateScorePair.second - lexMaxScore; 117 } 118 candidatesAdd( 119 candidates_, 120 candidatesBestScore_, 121 opt_.beamThreshold, 122 score + opt_.lmWeight * lmScore + opt_.wordScore, 123 lmState, 124 lexicon_->getRoot(), 125 &prevHyp, 126 n, 127 label, 128 false, // prevBlank 129 prevHyp.amScore + amScore, 130 prevHyp.lmScore + lmScore); 131 }

qzfnihao avatar Jun 30 '20 11:06 qzfnihao

Lexicon decoder for CTC is prefix beam search decoder.

In the branch you pointed we try to continue with a token which different from the previous last token in the prefix. In this case we switch to another LM state for example.

The branch you mentioned about having "hhh" and staying in the same LM state (which means we just squeeze them together to form one token) is here https://github.com/facebookresearch/wav2letter/blob/be863bb941108e95545b94fdf192722699295c63/src/libraries/decoder/LexiconDecoder.cpp#L153 - check on the hypothesis we add in this case https://github.com/facebookresearch/wav2letter/blob/be863bb941108e95545b94fdf192722699295c63/src/libraries/decoder/LexiconDecoder.cpp#L164.

tlikhomanenko avatar Jul 14 '20 06:07 tlikhomanenko