Yi-Chiao WU comments

Results 23 comments of


                                            Yi-Chiao WU

VCTK dataset

Hi @zhanghuiyu123, I would recommend you mention that you "reimplement AudioDec based on the open-source repo" in your paper to avoid any concerns from the reviewers although I think the...

vq_loss increase, not converge

Hi, the vq_loss becoming higher during training is normal since the encoder usually outputs white noise like latent in the beginning. When the encoder starts to learn something meaningful will...

vq_loss increase, not converge

> because [2,3,4,5] means the downsampling ratio=120, 9600/120=80 > 64(codebook_dim) Hi, the downsampling is for the temporal axis, so it should be 48000 (48kHz)/120=400Hz of the codes, which is different...

vq_loss increase, not converge

Yes, the batch_length is more related to the GPU useage, and the only requirement is that it can be divided by the downsample rate. I actually found that the longer...

Is it missing some activation functions between some layers?

Hi, Thanks for the interesting experiments! I think it is reasonable to add more nonlinearity to the model to enhance its modeling ability once the training is still stable. If...

The test results are different from those in the paper

Hi, The evaluation codes are mostly from the [sprocket-vc](https://github.com/k2kobayashi/sprocket/tree/master) repo. We used their [feature extractor](https://github.com/k2kobayashi/sprocket/blob/master/sprocket/speech/feature_extractor.py) to extract f0, u/v segments and mcep. We used their [melcd function](https://github.com/k2kobayashi/sprocket/blob/master/sprocket/util/distance.py) to calculate MCD....