acoustic-model MultiSpeaker setup

Have you try this on multi-speaker way ?

Aug 31 '22 12:08 rishikksh20

@bshall Can we also replace Encoder and Decoder with Transformers ?

Oct 26 '22 08:10 rishikksh20

Hi @rishikksh20, sorry about the delay on this. I only noticed this issue now.

I have tried a multi-speaker setup (about 10 speakers) using one-hot codes for each speaker. It works pretty well but I think there is a small degradation compared to the single speaker model. In my experience fine-tuning the acoustic model on a small amount of target data seems to work better. I haven't experimented with using speaker embeddings for a zero-shot model though so can't comment on how well it performs in that setting.

I'd imagine that using Transformers would be fine. I don't think such heavy machinery is required though. I have done some experiments training the Hifi-GAN directly on the soft units (augmented with the pitch contours) and this seems to work well. It also simplifies the pipeline since it makes the acoustic model unnecessary.

Oct 27 '22 09:10 bshall

@bshall Could you, please, tell more about HuBERT-to-HifiGAN experiments? What HifiGAN parameters should be changed? Did you use 256 dimension, like in HuBERT or did you retrain HuBERT with 128 dimension? How did you augmented soft units with pitch contours, somewhere in DataLoader or in Generator or Discriminator, where pitch was passed through nn.Embedding? Did you concatenated or added pitch contours to soft units?

Nov 08 '22 12:11 juliakorovsky

@rishikksh20 hi, did you try soft-unit for multispeaker setup for any-to-many voice conversion? if so, did you success? i'm trying just using one-hot codes for multi speaker setup now, but suffering from speaker identity degradation. even though result speech speech is quite audible.

Feb 08 '23 05:02 seastar105

yes I feel the same with my training

Feb 08 '23 08:02 rishikksh20

@rishikksh20 @seastar105 Have you tried with VITs/YourTTS as an acoustic model + vocoder with the multispeaker setting?

Feb 08 '23 08:02 MuruganR96

acoustic-model acoustic-model copied to clipboard

MultiSpeaker setup

acoustic-model
acoustic-model copied to clipboard