GST-Tacotron Reference Encoder Padding

Reference Encoder Padding

Open its-sandy opened this issue 5 years ago • 1 comments

How do we ensure that the padding of the reference mel spectogram is taken into account when the reference encoder is applied on a batch of mels?

Jun 23 '19 12:06 its-sandy

Came you to any conclusion? I faced this problem too, since gst encoder takes zero paddings, the network is able to take into account the duration of the audio, which on my dataset led to the fact that short lines are pronounced slowly, and long fast.

I tried using one-dimensional convolution and masking zero before gru layer, but it worsened the work of tokens.

Sep 23 '19 11:09 hadaev8

GST-Tacotron GST-Tacotron copied to clipboard

Reference Encoder Padding

GST-Tacotron
GST-Tacotron copied to clipboard