Kaizhi Qian comments

Results 196 comments of


                                            Kaizhi Qian

trafficstars

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

There is no need to split the audio. The post-processing length is the same within the batch. Just index from the spectrogram. For example, [0, 0.5,1, 1.5] and [0, 2,...

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

@Miralan Yes. If you have N speakers, just use N-dimensional one-hot embedding.

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

Yes

Question: Using a pretrained encoder for getting the speaker embedding.

@terbed Yes. Unless you retrain the model.

Is voice activity detection necessary for wav preprocessing？

No.

How do I solve this error when executing the last cell?

The wavenet vocoder has its own hparams. Please refer to the vocoder part in Autovc for details.

differences in mel-spectogram

Your frequency axis and time axis are swapped.

differences in mel-spectogram

They should only differ by the amount of silence before and after. Please confirm if this is true.

What is the format of the metadata?

For Chinese audio, you need to retrain the model and retune the hyper params.

What is the format of the metadata?

The metadata is all different depending on the use case. It is nothing but some sort of nested list. You can easily make your own by looking into one of...