Kaizhi Qian
Kaizhi Qian
sounds correct, but you don't need to remove the validation part
Thanks. The code is correct. 2 seconds.
Neither affects the performance nor the training speed.
Either way works. Did not exceed 3 sec. to fit into memory.
You can use one-hot embeddings if you are not doing zero-shot conversion. I implemented my own speaker encoder, which has not been released. The Resemblyzer is just a similar implementation...
No fundamental difference
By conditioning on speaker embedding, it changes the rhythm and timbre at the same time.
`with torch.no_grad(): spect_output, len_spect = P.infer_onmt(cep_real_A.transpose(2,1)[:,:14,:], real_mask_A, len_real_A, spk_emb_B)`
So that the encoder dose not need to learn that information from the spectrogram.
content encoder Without the speaker emb, it is harder for the encoder to learn that information from the spectrogram. Since you already have that info, just give it to the...