Kaizhi Qian comments

Results 196 comments of


                                            Kaizhi Qian

Does anyone reproduce the sound quality in the demo page?

The pre-trained model is for demonstration purposes only. The model should perform well after careful re-training. As far as I know, someone has made a voice conversion phone app for...

Hyperparameters for generating mel spectrogram from training .wav files

#4 see the last few comments

請問我該如何解決 repeats has to be Long tensor 的問題？(How to solve a problem)

Please make sure num_rep is a long tensor.

details about the noisy_train.mat and target_train.mat

NUM_TOKENS is batch_size. The enhancement model is single channel, so you can use any speech enhancement method as a drop in replacement. 24570 is the length of noisy signal based...

details about the noisy_train.mat and target_train.mat

1. The enhancement model is pretrained independently using single channel signal simulated under different SNR and room configurations. It can be any general enhancement model. 2. See above. 3. The...

details about the noisy_train.mat and target_train.mat

I was using tensorflow 1.3, but I think it will probably work with 1.4 or higher. Since the beamforming part is not implemented in Tensorflow, you could virtually use any...

Loading the model for inference

It is best to post this question under fairseq. For feature extraction, please refer to the readme under Hubert. You can find step-by-step instructions there.

How much time does it take to train this model?

1 day for the prior model using 4 gpus, and 5 days for the likelihood model.

How much time does it take to train this model?

Sampling rate is 16000 Hz The receptive field size in this case means the length of history or future dependencies. 20477 = 16384 + 4093