Brendan O'Connor comments

Results 11 comments of


                                            Brendan O'Connor

The output encoder

@liveroomand @nkcdy Did either of you finally figure out what the secret sauce is for training a version that converges to 0.0001 and yields audio of a similar quality to...

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

> Recently, I was trying to improve origin autovc by using F0 information. Using 256-dimensional one-hot vectors in the original autovc seems to perform well. But in the process of...

Issues with conversion of VCTK speakers using pre-trained model

@Jungwon-Chang Please correct me if I'm wrong (as I'm dying to know why my model won't produce good quality speech), but I think the paper describes that for many-to-many conversion,...

Issues with conversion of VCTK speakers using pre-trained model

It is the utterances from each speaker that are split into 9:1 - lets say there are 800 utterances paper speaker, then the model would be trained on 720 utterances...

Issues with conversion of VCTK speakers using pre-trained model

The code is a proof-of-concept of the zero-shot method. You would have to write the many-to-many yourself using one-hot encodings instead of speaker embeddings. On Thu, Jan 7, 2021 at...

reconstruction loss won't decrease

@billy800413 did you figure this out in the end? Vaguely recall that when i trained at 100k iterations on original test data as described in the paper, it does actually...

confusion with speaker encoder and loss func

Hi @CODEJIN. I have read the AutoVc and Tacotron papers. However neither seem to provide much information about why a postnet is used in the first place. Where can I...

confusion with speaker encoder and loss func

Do you know where I could learn more about postnet implementation? Its a tricky thing to just google. Thanks for replying so quickly! On Sun, Dec 13, 2020 at 3:16...

Does anyone reproduce the sound quality in the demo page?

I was able to produce audio that comprised of 'ghostly' voices after 100k iterations. There was however a lot of noise. Have either of you @WeiLi233 @xuexidi been able to...

Does anyone reproduce the sound quality in the demo page?

Check that the tensors are the same shape before computing their loss? On Wed, Jan 27, 2021 at 11:15 AM JohnHerry wrote: > I am using AISHELL-3 mandarin corpus to...