GUUser91 comments

Results 35 comments of


                                            GUUser91

Error Message: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (1024, 1024) at dimension 2 of input [1, 65621, 2]

@yl4579 I just convert a wav file from this youtube video down to a 1 second 24000khz wav file. https://www.youtube.com/watch?v=GQt4SY_6-w4. Here is the converted wav file https://files.catbox.moe/2jfn9u.wav I also get...

SLM Adversarial Training did not start when finetuning

@78Alpha I'm not sure if I'm doing this correctly, but does this image from my tensorboard folder mean that I was able to start SLM Adversarial Training? ![screencapture-localhost-6006-2024-05-12-17_32_05](https://github.com/yl4579/StyleTTS2/assets/77461922/a7e5a6fa-d08a-4d0e-8936-ae2d7dd7538f)

SLM Adversarial Training did not start when finetuning

I tinkered around with the config_ft.yml file. I set Max_Len to 120. I set batch_percentage to 1. I set slmadv_params min_len to 100 and slmadv_params max_len to 120. Batch size...

SLM Adversarial Training did not start when finetuning

Another observation. I trained a model with a dataset of 6 minutes. My previous datasets contained a lot of 1-3 seconds audio files. This one contained a lot of files...

SLM Adversarial Training did not start when finetuning

@PriyamJha0124 You mean fine tuning? I used the vokan model for that. https://huggingface.co/ShoukanLabs/Vokan

Fix batch size 1 by specifying squeeze dims

@Sobsz I tried your branch and I tried to finetune a model but I got the ZeroDivisionError: division by zero error message after 1 epoch. Batch size is set to...

Fix batch size 1 by specifying squeeze dims

@Sobsz Now I get the dimension out of range (expected to be in range of [-1, 0], but got 1) error message. I'm using the rocm5.7 nightly pytorch build if...

Fix batch size 1 by specifying squeeze dims

@Sobsz Here's a more clear version >>Traceback (most recent call last): File "/run/media/user/e1745494-af46-4749-9e1a-89d2b2289699/StyleTTS2/train_finetune.py", line 707, in main() File "/run/media/user/e1745494-af46-4749-9e1a-89d2b2289699/StyleTTS2/venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/run/media/user/e1745494-af46-4749-9e1a-89d2b2289699/StyleTTS2/venv/lib/python3.10/site-packages/click/core.py", line 1078,...

Better LJSpeech or LibriTTS for finetuning a single speaker voice? Or training from scratch with not so much data?

You can use vokan. https://huggingface.co/ShoukanLabs/Vokan

Query Regarding the Impact of Varied Acoustic Environments on Model Performance

A workaround for me to lessen the reverb effect with denoise is to overlay the audio with loud background music via kdenlive. Then I denoise the kdenlive audio output file....