TensorFlowASR
TensorFlowASR copied to clipboard
Question: Predict network in RNNT model
Hi, I have a question on the predict network in the code. For example: Input training label is such as below. Now we got 261 vocabularies . (just for example, subwords maybe : accept, call, reject ,phone)
- "accept call"
- "reject phone"
The testing cases is : "reject call" For the current RNNT transducer model. I can't get "reject call". My understanding is that such case isn't in the predict network. Is it right? How can I apply an lm for the predict network? Point out if there is anything wrong. Thanks.
Actually, they said the Transducer structure behaves like an LM, but anyway LM can be used in beam search. However, we haven't supported the LM in beamsearch (and the rnnt beam search itself is not really optimized). I think the transducer able to recognize unseen data, well in your case is "reject all". Because the "reject all" signal is fed into the encoder, then it will predict the "reject" then either "call" or "phone" but now the encoded input is different so it should predict near the "call" rather than "phone". Your way of thinking might be true though, because these models need lots of data to train, the more data the better the model. So when we have the data contain all cases in the world then the model will have no cases to fail.