SC-CNN
SC-CNN copied to clipboard
About additional loss
Hello, nice work. I have a question. Q) how about adding an extra loss at the end of generation to match the spk_enc of reference wav and generated wav? Because I do not see meta-stylespeech's discriminator being used here? (am i missing it somewhere?)
....
s_ref = self.spk_enc(y.transpose(1,2), (y_mask==0).squeeze(1))
....
## freeze spk_enc
s_out = self.spk_enc(y_out.transpose(1,2), (y_out_mask==0).squeeze(1))
# then cosine dist b/w s_ref and s_out
## unfreeze spk_enc
Thanks.
This repo has built on Pure VITS & StyleSpeech, in order to verify the SC-CNN technique. Do you mean SCL loss in YourTTS?
Is there any stylespeechloss in sc-cnn?
Loss terms related to meta-learning are excluded in this repo.