StyleTTS2 issues

Results 39 StyleTTS2 issues

Sort by recently updated

HELP WANTED!!!!!!!!!!!

Can we add a checkpoint to the styleTTS2 model to finetune the model from the last epoch we trained? As I am using the free version of Colab, I can...

21sK1p

Fix batch size 1 by specifying squeeze dims

Fixes #104 (as far as I can tell).

Sobsz

Fine-tuning or training from scratch in a differente language?

Hi everyone, I'm considering putting some effort into training StyleTTS in Portuguese. I have a good-quality dataset for this task, however, I was in doubt if it would be better...

paulovasconcellos-hotmart

help wanted

Awesome in english but no support for other languages - please add an example for another language (german, italian, french etc)

The readme makes it sound very simple: "Replace bert with xphonebert" Looking a bit closer looks like it's quite a feat to make StyleTTS2 talk in non-english languages (https://github.com/yl4579/StyleTTS2/issues/28) StyleTTS2...

cmp-nct

help wanted

SLM Adversarial Training did not start when finetuning

I tried to do finetuning on a small dataset with 2 speakers. I set `epochs=25`, `diff_epoch=8`, `joint_epoch=15`. The Style Diffusion training started as expected, but SLM Adversarial Training never started...

godspirit00

asr negative loss

Hi， @yl4579 ,thank you for your awesome work. I met some problems when train my model. When I train asr model with my phoneme symbols, I got negative ctc loss...

yijingshihenxiule

Issue with impropper pauses and random bursts of noise

Hello there, devs of Style TTS2, it's a great model, you really did a good job. I mainly use it on the hf demo, but there are some issues: Firstly,...

king-dahmanus

Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data?

Hi everyone, I'm wondering if it should be LJSpeech or LibriTTS the proper candidate to be used to finetune a single person voice. I've seen that there is a multispeaker...

Sweetapocalyps3

Second stage training with smaller window size

I've trained a model from scratch with batch size: 8 and window size: 500 on 4xa10 GPUs. Entering the second phase of training I'm getting the following error: `torch.cuda.OutOfMemoryError: CUDA...

meng2468

Possible Bug in Style Diffusion Inference Code

Here we see that the ground truth for the denoiser is the `(acoustic_styles, prosodic_styles)`: https://github.com/yl4579/StyleTTS2/blob/main/train_second.py#L307 But here we see that the output from the sampler is parsed as `(prosodic_styles, acoustic_styles)`...

brthor

StyleTTS2
StyleTTS2 copied to clipboard

Metadata

HELP WANTED!!!!!!!!!!!

Fix batch size 1 by specifying squeeze dims

Fine-tuning or training from scratch in a differente language?

Awesome in english but no support for other languages - please add an example for another language (german, italian, french etc)

SLM Adversarial Training did not start when finetuning

asr negative loss

Issue with impropper pauses and random bursts of noise

Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data?

Second stage training with smaller window size

Possible Bug in Style Diffusion Inference Code

← Metadata

Owner

Metadata

StyleTTS2 StyleTTS2 copied to clipboard

Metadata

← Metadata

Owner

Metadata

StyleTTS2
StyleTTS2 copied to clipboard