Exp 1. YourTTS-EN(VCTK) + SCL(speaker encoder layers are not initialized )
I tried to run an experiment similar to Exp 1. YourTTS-EN(VCTK) + SCL initializing use_speaker_encoder_as_loss=true, speaker_encoder_loss_alpha=9.0, speaker_encoder_config_path and speaker_encoder_model_path(downloaded them from your google disk
So my config file is almost identical to the one you have for the experiment(I don't have fine_tuning_mode=0, but I checked and 0 means disabled, so it shouldn't affect anything. Also use_speaker_embedding=false, otherwise it complains that vectors are initialized)
My problem is when I print out model weights keys of your model and mine I have speaker encoder layers missing. They are not initialized for some reason. Unfortunately, I don't have any ideas why this could be happening :( Could you maybe point out a direction and what could I check?
"use_sdp": true,
"noise_scale": 1.0,
"inference_noise_scale": 0.667,
"length_scale": 1,
"noise_scale_dp": 1.0,
"inference_noise_scale_dp": 0.8,
"max_inference_len": null,
"init_discriminator": true,
"use_spectral_norm_disriminator": false,
"use_speaker_embedding": true,
"num_speakers": 97,
"speakers_file": null,
"d_vector_file": "../speaker_embeddings/new-SE/VCTK+TTS-PT+MAILABS-FR/speakers.json",
"speaker_embedding_channels": 512,
"use_d_vector_file": true,
"d_vector_dim": 512,
"detach_dp_input": true,
"use_language_embedding": false,
"embedded_language_dim": 4,
"num_languages": 0,
"use_speaker_encoder_as_loss": true,
"speaker_encoder_config_path": "../checkpoints/Speaker_Encoder/Resnet-original-paper/config.json",
"speaker_encoder_model_path": "../checkpoints/Speaker_Encoder/Resnet-original-paper/converted_checkpoint.pth.tar",
"fine_tuning_mode": 0,
"freeze_encoder": false,
"freeze_DP": false,
"freeze_PE": false,
"freeze_flow_decoder": false,
"freeze_waveform_decoder": false
Hi, Sorry for the delay. Did you find the problem? What version of the :frog: TTS you are using?
@stalevna Hi! Did you manage to replicate the results of the first experiment?
@stalevna Hi! Did you manage to replicate the results of the first experiment?
Recently, we created a recipe that replicates the first experiment proposed in the YourTTS paper. The recipe replicates the single language training using the VCTK dataset (it downloads, resamples, and extracts the speaker embeddings automatically :)). However, if you are interested in multilingual training, we have commented on parameters on the VitsArgs class instance that should be enabled for multilingual training: https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py