Yi-Chiao WU
Yi-Chiao WU
Hi Solomid and bobbdunn, sorry for the late reply. "spk_name.f0" includes the upper and lower bounds of the spk_name's f0 for the feature extraction. "spk_name.pow" includes the power threshold in...
Hi Yang, Thanks for your question. There are several possible reasons. First, since the provided pretrain model was trained using very limited data of only VCC2018, the mismatches between new...
I think you also get a figure plotting the distribution of power (npowhistogram), right? The figure may have a peak higher than 0 dB related to the most speech frames...
Hi Unilight, thank you very much!! I will merge the modification into the current repo.
> 1)Is the Acoustic feature in the generator,1d or 2d. > What is it,mel spectogram extracted from natural speech?Or text to mel spectogram(from other framework?) > I noticed there is...
For the forward function in each class, I have provided the information about the input/output tensor. For example, in the "AdaptiveBlock" class in the "residual_block.py", I have provided the following...
Yes, (B, residual_channels, T) denotes a 3d tensor. B->mini batch_size. T->data length. Take the training process as an example, according to the config file (https://github.com/bigpon/QPPWG/blob/master/egs/vcc18/conf/vcc18.QPPWGaf_20.yaml), we know that the batch...
I randomly selected 172 files from the 28 and 56 speakers' training sets to form the validation set.
Here are the selected files for the validation set. p282_239.wav p314_058.wav p339_266.wav p323_362.wav p275_161.wav p299_340.wav p326_324.wav p283_156.wav p301_222.wav p233_017.wav p267_060.wav p301_041.wav p258_383.wav p241_157.wav p276_431.wav p244_006.wav p306_237.wav p326_227.wav p231_367.wav p256_320.wav p247_229.wav...
Hi, I used the clean_trainset_28spk_wav.zip clean_trainset_56spk_wav.zip from https://datashare.ed.ac.uk/handle/10283/2791?show=full Both stage1 and stage2 use the same dataset.