Alexey322 issues

Results 8 issues of


                                            Alexey322

Rayhane-mamah Tacotron-2 + r9y9 wavenet_vocoder compatibility

Hey. I trained the Tacotron 2 synthesizer from Rayhane-mamah, the synthesized spectrograms sound good if you use the griffin limm algorithm. Unfortunately, the vocoder in his repository learns with an...

Rayhane-mamah Tacotron-2 + r9y9 wavenet_vocoder compatibility

Pad audio fragment

Why do we need pad audio fragment while receiving its mel spec? `y = torch.nn.functional.pad(y.unsqueeze(1), (int((n_fft-hop_size)/2), int((n_fft-hop_size)/2)), mode='reflect')`

Inconsistency between the model parameters in the paper and the implementation on the github

@jik876 Hi. I would like to know why you are not using the same parameters(for V1 configuration) as indicated in the paper? Your code has set the following parameters: "resblock_kernel_sizes":...

What are the correct model changes for sample rate 44100?

Hi, @jik876. Can you give some advice on how to change the model correctly for the 44100 sample rate? I don't mean hyperparameters in config. For example, how did you...

ExponentialLR and fine tune

Hi. I started training the model from scratch and found that the optimizer uses a dynamic learning step. If I train the model with 2.5 million steps, then according to...

Training loss becomes nan when the number of speakers changes

Hi. I trained the flowtron on two speakers, for a total of 50 hours, 25 for each. After that, I wanted to train the model for 10 speakers for 20-30...

FastPitch - attention not conditioned on speaker embedding

Hello. Why the attention doesn't use speaker embeddings to find the alignment between text and mel spectrograms? This can vary greatly between speakers speaking in different styles.