Brendan O'Connor
Brendan O'Connor
@liveroomand @nkcdy Did either of you finally figure out what the secret sauce is for training a version that converges to 0.0001 and yields audio of a similar quality to...
> Recently, I was trying to improve origin autovc by using F0 information. Using 256-dimensional one-hot vectors in the original autovc seems to perform well. But in the process of...
@Jungwon-Chang Please correct me if I'm wrong (as I'm dying to know why my model won't produce good quality speech), but I think the paper describes that for many-to-many conversion,...
It is the utterances from each speaker that are split into 9:1 - lets say there are 800 utterances paper speaker, then the model would be trained on 720 utterances...
The code is a proof-of-concept of the zero-shot method. You would have to write the many-to-many yourself using one-hot encodings instead of speaker embeddings. On Thu, Jan 7, 2021 at...
@billy800413 did you figure this out in the end? Vaguely recall that when i trained at 100k iterations on original test data as described in the paper, it does actually...
Hi @CODEJIN. I have read the AutoVc and Tacotron papers. However neither seem to provide much information about why a postnet is used in the first place. Where can I...
Do you know where I could learn more about postnet implementation? Its a tricky thing to just google. Thanks for replying so quickly! On Sun, Dec 13, 2020 at 3:16...
I was able to produce audio that comprised of 'ghostly' voices after 100k iterations. There was however a lot of noise. Have either of you @WeiLi233 @xuexidi been able to...
Check that the tensors are the same shape before computing their loss? On Wed, Jan 27, 2021 at 11:15 AM JohnHerry wrote: > I am using AISHELL-3 mandarin corpus to...