StyleTTS2 icon indicating copy to clipboard operation
StyleTTS2 copied to clipboard

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Results 39 StyleTTS2 issues
Sort by recently updated
recently updated
newest added

Can we add a checkpoint to the styleTTS2 model to finetune the model from the last epoch we trained? As I am using the free version of Colab, I can...

Fixes #104 (as far as I can tell).

Hi everyone, I'm considering putting some effort into training StyleTTS in Portuguese. I have a good-quality dataset for this task, however, I was in doubt if it would be better...

help wanted

The readme makes it sound very simple: "Replace bert with xphonebert" Looking a bit closer looks like it's quite a feat to make StyleTTS2 talk in non-english languages (https://github.com/yl4579/StyleTTS2/issues/28) StyleTTS2...

help wanted

I tried to do finetuning on a small dataset with 2 speakers. I set `epochs=25`, `diff_epoch=8`, `joint_epoch=15`. The Style Diffusion training started as expected, but SLM Adversarial Training never started...

Hi, @yl4579 ,thank you for your awesome work. I met some problems when train my model. When I train asr model with my phoneme symbols, I got negative ctc loss...

Hello there, devs of Style TTS2, it's a great model, you really did a good job. I mainly use it on the hf demo, but there are some issues: Firstly,...

Hi everyone, I'm wondering if it should be LJSpeech or LibriTTS the proper candidate to be used to finetune a single person voice. I've seen that there is a multispeaker...

I've trained a model from scratch with batch size: 8 and window size: 500 on 4xa10 GPUs. Entering the second phase of training I'm getting the following error: `torch.cuda.OutOfMemoryError: CUDA...

Here we see that the ground truth for the denoiser is the `(acoustic_styles, prosodic_styles)`: https://github.com/yl4579/StyleTTS2/blob/main/train_second.py#L307 But here we see that the output from the sampler is parsed as `(prosodic_styles, acoustic_styles)`...