StarGANv2-VC How to disentangle style and speaker information?

I would like to transfer speech style of one speaker and apply it to the another speaker, while preserving identity of the speaker.

Do you have any advice how to use it for emotional cross-speaker style transfer? I thought about adding additional discriminator to classify speaker id but how to define domains in such case?

Thanks

Jan 30 '23 14:01 mozykhau

You can define the domains in terms of emotions instead of speakers. This way you can preserve the speakers but only convert emotions.

Jan 31 '23 20:01 yl4579

Thanks, Approach of defining domains as emotions instead of speakers worked but sometimes it messed up speaker identity for specific emotional domains. Found an interesting research for EVC based on StarGANv2-vc by Sony Research India: https://arxiv.org/pdf/2302.10536.pdf. They added second encoder and classifier for speaker domain for better disentanglement.

Feb 27 '23 12:02 mozykhau

Maybe that's because a same people speaks too many same emotional sentences?

Mar 24 '23 07:03 chiaki-luo

@yl4579 Hi，thanks for this project. I want to know if this domain is the emotional category of one speaker or many speakers?

You can define the domains in terms of emotions instead of speakers. This way you can preserve the speakers but only convert emotions.

Apr 08 '23 17:04 CONGLUONG12

@CONGLUONG12 It should be of multiple speakers. You can refer to https://arxiv.org/pdf/2302.10536.pdf for more details. This is a good example of how to modify StarGANv2-VC for emotion conversion.

Apr 09 '23 06:04 yl4579

@yl4579 Thank you very much. In your demo, you chose a speaker with a specific emotion. With this emotion, if you choose another speaker (call speaker A) included in the training set, you will have sound with this emotion and timbre of A?

Apr 11 '23 10:04 CONGLUONG12

@CONGLUONG12 Probably yes, if speaker A has samples in the training set with similar emotions, otherwise it might not work.

Apr 16 '23 07:04 yl4579

Hey There! I made something similar for my MSc Degree in AI starting from the great implementation of @yl4579 Take a look there for some hint Here

Sep 05 '23 13:09 gnekt

StarGANv2-VC StarGANv2-VC copied to clipboard

How to disentangle style and speaker information?

StarGANv2-VC
StarGANv2-VC copied to clipboard