Kaizhi Qian comments

Results 198 comments of


                                            Kaizhi Qian

Why need original speaker embeddings concatenated with original speaker spectrogram?

There must be papers describing this technique, but I don't know any of them. It should be very simple and intuitive. For example, you need to solve A and B,...

Why need original speaker embeddings concatenated with original speaker spectrogram?

My explanation is for the encoder and I perfectly understood your question. Without feeding emb_org the encoder can learn to disentangle the content and identity, but it will be easier...

How to check speaker disentanglement during training?

Sorry I could not understand your question. For example, what is image representation?

An error on attempt to train model

Please check if your input shape is compatible with the neural network's required input shape.

How to build the validation data?

What is the "G identity mapping loss" step? I guess one of the tensors needs to be transposed because dim and length mean different things.

How to test on my own data?

You can follow the code in conversion.ipynb

How to test on my own data?

Each metadata is a list of [filename, speaker embedding, spectrogram]

Differences in Architecture Between Code and Paper

1. The speaker emb is also concatenated with the encoder output before feeding into the decoder. 2. yes, the speaker emb is extracted from the same speaker but most likely...

KeyError when run prepare_train_data.py

No. You can check by printing all the keys.

confusion with speaker encoder and loss func

Please refer to #24 for speaker encoder. You don't need speaker encoder if you don't do zero-shot conversion. "During training, reconstruction loss is applied to both the initial and final...