nonparaSeq2seqVC_code icon indicating copy to clipboard operation
nonparaSeq2seqVC_code copied to clipboard

The mechanism of alignment between text encoder output and audio_seq2seq output

Open inconnu11 opened this issue 4 years ago • 1 comments

Hi, Zhang Could you please explain how the text encoder output and recognition encoder output align? it is stated in your paper as "The recognition encoder Er is a seq2seq neural network which aligns the acoustic and phoneme sequences automatically." I couldn't figure out how the code work. Thank you advance!

inconnu11 avatar May 11 '20 10:05 inconnu11

Hi, by saying that, I mean the recognition encoder is a seq2seq with attention module, and its definition is here https://github.com/jxzhanggg/nonparaSeq2seqVC_code/blob/4c03a6be3bc76207b7cf8222c985dc85c7018cde/pre-train/model/layers.py#L216-L456

jxzhanggg avatar May 11 '20 16:05 jxzhanggg