Yi-Chiao WU

Results 23 comments of Yi-Chiao WU

Hi @zhanghuiyu123, I would recommend you mention that you "reimplement AudioDec based on the open-source repo" in your paper to avoid any concerns from the reviewers although I think the...

Hi, the vq_loss becoming higher during training is normal since the encoder usually outputs white noise like latent in the beginning. When the encoder starts to learn something meaningful will...

> because [2,3,4,5] means the downsampling ratio=120, 9600/120=80 > 64(codebook_dim) Hi, the downsampling is for the temporal axis, so it should be 48000 (48kHz)/120=400Hz of the codes, which is different...

Yes, the batch_length is more related to the GPU useage, and the only requirement is that it can be divided by the downsample rate. I actually found that the longer...

Hi, Thanks for the interesting experiments! I think it is reasonable to add more nonlinearity to the model to enhance its modeling ability once the training is still stable. If...

Hi, The evaluation codes are mostly from the [sprocket-vc](https://github.com/k2kobayashi/sprocket/tree/master) repo. We used their [feature extractor](https://github.com/k2kobayashi/sprocket/blob/master/sprocket/speech/feature_extractor.py) to extract f0, u/v segments and mcep. We used their [melcd function](https://github.com/k2kobayashi/sprocket/blob/master/sprocket/util/distance.py) to calculate MCD....

Hi @BridgetteSong, - Thanks for the great efforts of investigation! I will check the results of 48kHz VCTK corpus. - Do you have any plan to write a paper about...

Hi, according to the figures, the correct f0 search range of P232 should be around 70 -240 Hz, and that of P257 should be 140-340. (The behind theory is that...

Hi, Thanks for your investigation! According to our internal experiments, we get some conclusions. 1. Adding more activation functions like HiFiGAN will slightly increase the unseen data robustness. However, it...

Hi, I just checked my code and sorry that I forgot to mention some details. 1. I extract “mcep” and “f0” features before doing any VAD and DTW. The settings...