GST-Tacotron icon indicating copy to clipboard operation
GST-Tacotron copied to clipboard

Reference Encoder Padding

Open its-sandy opened this issue 5 years ago • 1 comments

How do we ensure that the padding of the reference mel spectogram is taken into account when the reference encoder is applied on a batch of mels?

its-sandy avatar Jun 23 '19 12:06 its-sandy

Came you to any conclusion? I faced this problem too, since gst encoder takes zero paddings, the network is able to take into account the duration of the audio, which on my dataset led to the fact that short lines are pronounced slowly, and long fast.

I tried using one-dimensional convolution and masking zero before gru layer, but it worsened the work of tokens.

hadaev8 avatar Sep 23 '19 11:09 hadaev8