Kaizhi Qian comments

Results 196 comments of


                                            Kaizhi Qian

mel spectrogram normalization range

The spectrogram should be between 0 and 1. Anyway, the fast vocoder is released. See README.

Downsampling for VCTK corpus

No, but this shouldn't matter.

How to compute GPE FFE

I don't have access to that code. But it is very simple to implement by looking up the formulae in this paper http://www.seas.ucla.edu/spapl/paper/chu_icassp_09.pdf

How to get the same mel feature in "metadata.pkl"?

No I didn't.

How to get the same mel feature in "metadata.pkl"?

The first dim is the number of frames. There is no dimension reduction. It should be around 90 for that utterance. Please double-check your code.

How to get the same mel feature in "metadata.pkl"?

The sampling rate should be 16k instead of 48k

How to get the same mel feature in "metadata.pkl"?

The length does not have to be 90. As long as the sampling frequency is correct, it should be fine.

How to get the same mel feature in "metadata.pkl"?

No. and the procedures for downsampling should not make a big difference.

How to get the same mel feature in "metadata.pkl"?

OK. That explains it. I trimmed the silence off by hand.

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

The answer is 2. You will need µ and σ for inference. However, for unseen speakers, you can normalize using its own µ and σ, which is not a bad...