Jaehyeon Kim comments

Results 16 comments of


Jaehyeon Kim

Inference benchmarks on CPU / single-thread performance

Hi @snakers4. We reported inference speed tests on a GPU server rather than CPU only environments, as it is a representative indicator for speed comparison in many papers. I think...

is that able to train on Chinese dataset?

Definitely, yes! But you may need any text-to-phone converter such as [Phonemizer](https://github.com/bootphon/phonemizer) to convert Chinese text into phonemes. This model gets phonemes as input rather than characters.

is that able to train on Chinese dataset?

@LG-SS Now the paper is available: https://arxiv.org/abs/2106.06103 > > Definitely, yes! But you may need any text-to-phone converter such as [Phonemizer](https://github.com/bootphon/phonemizer) to convert Chinese text into phonemes. > > This...

is that able to train on Chinese dataset?

> @jaywalnut310 Hi, may I ask one last question, how's the latency compare with tacotron2 (I mean e2e lantency, tacotron2 may also need a vcoder which count in), is vits...

is that able to train on Chinese dataset?

> @jaywalnut310 This model is autoregressive or non autoregressive ? Hi @leminhnguyen, this model is non autoregressive.

is that able to train on Chinese dataset?

@leminhnguyen Well, VITS provides controllability to some extent. You can control and change the duration manually. You can control and change the energy and pitch by manipulating the latent representation...

[train.py] epoch당 진행률 계산

Though my english is poor, I'll anwer in english for other people. Yes, the 127th line of train.py doesn't consider the number of gpus, which may cause misunderstanding about training...

Loss value

Sorry for the dense calculation of the MLE loss... I'll let you know when I clean up the clutter in the code. Temporarily, I'll explain the loss one by one....

Loss value

Yes the constant term is ignored in backpropgation. I just left it for exact calculation of log likelihood. And I saw AlignTTS, which also proposes an alignment search algorithm similar...

멀티-스피커 버전 버그?

So your situation is: 1) you have your own multi-speaker dataset, and the total duration of the dataset is only one hour. 2) you trained the model with the LJ...