Kaizhi Qian

Results 196 comments of Kaizhi Qian
trafficstars

There is no need to split the audio. The post-processing length is the same within the batch. Just index from the spectrogram. For example, [0, 0.5,1, 1.5] and [0, 2,...

@Miralan Yes. If you have N speakers, just use N-dimensional one-hot embedding.

The wavenet vocoder has its own hparams. Please refer to the vocoder part in Autovc for details.

Your frequency axis and time axis are swapped.

They should only differ by the amount of silence before and after. Please confirm if this is true.

For Chinese audio, you need to retrain the model and retune the hyper params.

The metadata is all different depending on the use case. It is nothing but some sort of nested list. You can easily make your own by looking into one of...