Francesco comments

Results 87 comments of


                                            Francesco

Why use dropout in decoder prenet also in inference?

Hi, this is taken from the tacotron paper. I believe it helps with "highlighting" the position information for the autoregressive predictions.

Regd Forcing Encoder Attention Alignments

Hi, in my experiments the encoder alignments are rather optional, that's why I set it to a lower number of steps than the decoder. You probably can safely set it...

ERROR during preprocessing

@luis-vera is this solved?

Invalid array shape in extract_durations

Hi, any further debug information? Does it always occur with the same samples? How do these samples look like?

Invalid array shape in extract_durations

Hi, are you still having this issue?

Does it support export to onnx?

I did not try yet, sorry. In case you do, would be great if you reported here.

why reduced version of mel spec for the decoder traning?

Hi, r is the reduction factor, it's a technique used to build up attention, which otherwise requires careful tuning and or other less effective techniques. The idea behind is that...

why reduced version of mel spec for the decoder traning?

Hi, I struggled to find any literature on this myself, you should find some in the Tacotron paper. All the rest, like scheduling, are rules of thumb I'm afraid.

any idea of generate speech of long text?

Hi, that's great! The collapse is a know issue of the autoregressive model. I'm not sure about cover field, are you using convolutions or dense layers after the attention mechanism?...

Can voice control be used somehow to make generated audio sound like a particular voice?

What is voice control?