Kaizhi Qian

Results 196 comments of Kaizhi Qian

The pre-trained model is for demonstration purposes only. The model should perform well after careful re-training. As far as I know, someone has made a voice conversion phone app for...

NUM_TOKENS is batch_size. The enhancement model is single channel, so you can use any speech enhancement method as a drop in replacement. 24570 is the length of noisy signal based...

1. The enhancement model is pretrained independently using single channel signal simulated under different SNR and room configurations. It can be any general enhancement model. 2. See above. 3. The...

I was using tensorflow 1.3, but I think it will probably work with 1.4 or higher. Since the beamforming part is not implemented in Tensorflow, you could virtually use any...

It is best to post this question under fairseq. For feature extraction, please refer to the readme under Hubert. You can find step-by-step instructions there.

1 day for the prior model using 4 gpus, and 5 days for the likelihood model.

Sampling rate is 16000 Hz The receptive field size in this case means the length of history or future dependencies. 20477 = 16384 + 4093