yiwei0730 issues

Results 12 issues of


yiwei0730

The demo problem in hp

I have a problem in hparams I follow some solve answer to change the "import hparams as hp" to "from utils.hparams import HParam hp = HParam("configs/default.yaml") " And I found...

The audio time problem in the synthesis

![1](https://user-images.githubusercontent.com/62238248/151505977-5ee52bef-84e0-47b9-9adc-a69d0456f6df.png) The audio in tensorboad, I saw that if the audio length is greater than 11 seconds, it will be reduced to 11 seconds during synthesis. I want to know...

Voice conversion

Is there any voice conversion using ?

audio play problem in ABX/AB test

When I was using ABX to compare audio files, I found that every time the audio file was played, the first part of the audio file would be repeated. How...

some Voice editing problem

I have noticed some testing and demo issues regarding voice editing I would like to ask you about when you edit the last part of the text, for example: https://youtu.be/PJ2qSjycLcw?t=353,...

the semi-supervised used

The current implementation is not trained in a semi-supervised way due to the small dataset size. But it can be easily activated by specifying target speakers and passing no emotion...

訓練步驟

We first train the audio codec using 8 NVIDIA TESLA V100 16GB GPUs with a batch size of 200 audios per GPU for 440K steps. We follow the implementation and...

diff-vits vs NS2 tts-v2

想要請教您幾個問題 1. 想請問diff-vits這個項目與ns2 tts-v2的差別在哪裡目前粗略看過去以及以前有看到，似乎是將主模型改成vits但留下了naturalspeech的架構? 2. 我在tts-v2的模型中測試了一個1500+音色 600+hr的訓練資料集，測試集外數據還是會有大部分不太相似的情況。是否真如論文所測試，需要更大量的數據集才能有集外的泛化性效果。您認為大概需要多少小時和多少資料以上的音色才能有較好的結果。 3. 想請問您覺得MFA所預測出來的ground-truth duration與利用MAS預測出來的duration 兩者的差別在哪，您似乎比較偏好於MAS的預測系統。

How to use multiple-GPU in training?

I saw the solve in close issue python -m torch.distributed.launch --nproc_per_node 2 -m vall_e.train yaml=config/your_data/ar.yml use this command can use double gpus but the speed didn't fast than the one...

Training result

I'd like to inquire about the training results. I have combined datasets AISHELL3, aidata, and a Chinese dataset, totaling 600 hours of training. Although the three audio files are not...