Yi-Chiao WU comments

Results 23 comments of


                                            Yi-Chiao WU

trafficstars

.f0 and .pow features

Hi Solomid and bobbdunn, sorry for the late reply. "spk_name.f0" includes the upper and lower bounds of the spk_name's f0 for the feature extraction. "spk_name.pow" includes the power threshold in...

Why the synthesized-speech is not better than WORLD?

Hi Yang, Thanks for your question. There are several possible reasons. First, since the provided pretrain model was trained using very limited data of only VCC2018, the mismatches between new...

Why the synthesized-speech is not better than WORLD?

I think you also get a figure plotting the distribution of power (npowhistogram), right? The figure may have a peak higher than 0 dB related to the most speech frames...

Support PyTorch >=1.7

Hi Unilight, thank you very much!! I will merge the modification into the current repo.

Questions about implementations

> 1)Is the Acoustic feature in the generator,1d or 2d. > What is it,mel spectogram extracted from natural speech?Or text to mel spectogram(from other framework?) > I noticed there is...

Questions about implementations

For the forward function in each class, I have provided the information about the input/output tensor. For example, in the "AdaptiveBlock" class in the "residual_block.py", I have provided the following...

Questions about implementations

Yes, (B, residual_channels, T) denotes a 3d tensor. B->mini batch_size. T->data length. Take the training process as an example, according to the config file (https://github.com/bigpon/QPPWG/blob/master/egs/vcc18/conf/vcc18.QPPWGaf_20.yaml), we know that the batch...

VCTK dataset

I randomly selected 172 files from the 28 and 56 speakers' training sets to form the validation set.

VCTK dataset

Here are the selected files for the validation set. p282_239.wav p314_058.wav p339_266.wav p323_362.wav p275_161.wav p299_340.wav p326_324.wav p283_156.wav p301_222.wav p233_017.wav p267_060.wav p301_041.wav p258_383.wav p241_157.wav p276_431.wav p244_006.wav p306_237.wav p326_227.wav p231_367.wav p256_320.wav p247_229.wav...

VCTK dataset

Hi, I used the clean_trainset_28spk_wav.zip clean_trainset_56spk_wav.zip from https://datashare.ed.ac.uk/handle/10283/2791?show=full Both stage1 and stage2 use the same dataset.