Kaizhi Qian

Results 198 comments of Kaizhi Qian

There must be papers describing this technique, but I don't know any of them. It should be very simple and intuitive. For example, you need to solve A and B,...

My explanation is for the encoder and I perfectly understood your question. Without feeding emb_org the encoder can learn to disentangle the content and identity, but it will be easier...

Sorry I could not understand your question. For example, what is image representation?

Please check if your input shape is compatible with the neural network's required input shape.

What is the "G identity mapping loss" step? I guess one of the tensors needs to be transposed because dim and length mean different things.

You can follow the code in conversion.ipynb

Each metadata is a list of [filename, speaker embedding, spectrogram]

1. The speaker emb is also concatenated with the encoder output before feeding into the decoder. 2. yes, the speaker emb is extracted from the same speaker but most likely...

No. You can check by printing all the keys.

Please refer to #24 for speaker encoder. You don't need speaker encoder if you don't do zero-shot conversion. "During training, reconstruction loss is applied to both the initial and final...