Kaizhi Qian

Results 196 comments of Kaizhi Qian

Either way works. Did not exceed 3 sec. to fit into memory.

You can use one-hot embeddings if you are not doing zero-shot conversion. I implemented my own speaker encoder, which has not been released. The Resemblyzer is just a similar implementation...

By conditioning on speaker embedding, it changes the rhythm and timbre at the same time.

`with torch.no_grad(): spect_output, len_spect = P.infer_onmt(cep_real_A.transpose(2,1)[:,:14,:], real_mask_A, len_real_A, spk_emb_B)`

So that the encoder dose not need to learn that information from the spectrogram.

content encoder Without the speaker emb, it is harder for the encoder to learn that information from the spectrogram. Since you already have that info, just give it to the...