openspeech
openspeech copied to clipboard
Maybe there is a logic bug in the transformer-transducer decoding part
❓ Questions & Help
Thanks for providing the so excellent codebase. I have a question about the transformer-transducer decoding part. I am not sure whether there is a bug implicitly or I don't fully understand the trans-t framework.
Details
In the "greedy_decode" method, which is in transformer_transducer/model.py module, a "decoder_output" and an "encoder_output" are concatenated in each time step, and then the fused vector is fed to the joint network.
Within the scope of my knowledge, if the output is the "blank" symbol, the transducer only uses the next acoustic feature embedding, but keeps the decoder_output fixed. Why is the decoder_output also updated?
Thanks for taking the time to answer my question.
@hasangchun
@YuXI-Chn I think I missed it. Thanks for letting me know. I'll fix it as soon as possible.
happy to help!