unsuper_tts_asr
unsuper_tts_asr copied to clipboard
Question about DAE loss
Hi, I have read your paper: Almost Unsupervised Text to Speech and Automatic Speech Recognition, and I really like its idea. I have a little question about the dae loss (L_dae) in it. When calculating L_dae, do you input C(x) to Decoder and calculate the cross attention with Encoder's outputs, then use the Decoder's outputs to calculate the L_dae? Or do you just use the output of Encoder to calculate L_dae and does not involve the Decoder part, just like a traditional MLM task.