SpeechSplit How to check speaker disentanglement during training?

How to check speaker disentanglement during training?

Open lambda-delta34 opened this issue 4 years ago • 1 comments

What I have done: I purposely set a 0-like speaker embedding vector during testing for both image representation and loss measure (MSE, I assume higher is better).

For the result, I can clearly observe a significant MSE (around 33) after few days of training. However, after doing the real voice conversion (from one speaker to another), the model only achieves reconstruction without voice conversion.

If possible, it would be really appreciated knowing if there exist other ways to test voice conversion during training.

Great Thanks.

Aug 03 '21 19:08 lambda-delta34

Sorry I could not understand your question. For example, what is image representation?

Aug 03 '21 22:08 auspicious3000

SpeechSplit SpeechSplit copied to clipboard

How to check speaker disentanglement during training?

SpeechSplit
SpeechSplit copied to clipboard