so-vits-svc-fork
so-vits-svc-fork copied to clipboard
Trained... until around 3900 epochs 😮 Model is ruined?
Describe the bug I let the training to go up after it was fine for awhile. Checking from time to time, around 2400 epochs it was fine but need more training of course (I wanted to get to 10000). I checked after a day of training (slow PC) in around 3900 epochs via the GUI App and the results are: "Bleep... silence... Bleep"
I test the SAME source (any voice) with other models it works fine.
I only use 3 backups (since it takes space) but I understand now it was useless. Even with the latest backup it's ruined with BLEEPS... instead of the trained voice.
I've tried with "crepe" but also with any other model.
Screenshot example of SOURCE and RESULT:
-
Is it a known issue?
-
Is there a way to FIX / CORRECT the current BLEEP result?
Unfortunately, I don't know how to reproduce and I'm worried about wasting lots of hours of training and in some point it will also break the model.
Is there a CHANCE I can still save that model and continue training it? or that's it... it's ruined for good now? 😮
Thanks for posting. We have not observed any problem that serious. I'm sorry but for all your lengthy posts, you don't seem to write much necessary information; please follow the Issue template and clearly indicate the version and the steps to reproduce the issue.
Thanks for posting. We have not observed any problem that serious. I'm sorry but for all your lengthy posts, you don't seem to write much necessary information; please follow the Issue template and clearly indicate the version and the steps to reproduce the issue.
Thanks for the reply, Unfortunately I don't have much more information to give beside the fact that I used the latest version.
I mentioned that I don't know how to reproduce this, that's why I asked my other questions hopefully maybe it's known, I guess it's a new bug so it's not familiar to anyone yet.
For now I understand that I can only train LESS than 3000 epochs if I don't want the model to be ruined, but I believe it's not an accurate number but just being carful.
Do not use the latest version - the latest version does not infer correctly. I discovered this yesterday as well. I haven't yet written a bug report yet but until then do not use anything beyond https://github.com/voicepaw/so-vits-svc-fork/releases/tag/v3.10.4 unless you want a broken inference
Do a git checkout cbd3896 and that should revert you to a working version
Filed a formal report here: https://github.com/voicepaw/so-vits-svc-fork/issues/516
git checkout cbd3896
That commit isn't related to my issue, since it's not a quality problem. In my case trained around 3900 epochs lose sound and became BLEEPS, but by looking at the wave forms it does show that it "tries" to create it based on the correct silence spaces of the source. but the real problem is that it's not generating a basic (normal) wave form at all, but bleeps as I showed and explained above in the image above.
If I train from scratch it works fine, the problem is that I'm not sure in what point it may break or ruin the model and became "BLEEPS" instead of actual voice again.
I've tried the versions before, I'm afraid whatever already happened during the training ruined the model and I can't fix that ruined model... unless there is a way to save that model so I can resume training from the same point so feel free to share if you know anything about this.
I don't know what cause it, that's why I can't re-produce it.
Did you infer from the D_xxxx or G_xxxx pth? Sometimes I make the mistake of using the D file and get those bleeps as well, just a suggestion.
Did you infer from the D_xxxx or G_xxxx pth? Sometimes I make the mistake of using the D file and get those bleeps as well, just a suggestion.
I use G always by default, I know what you mean with D it's some sort of "whistling blurry noises" but that's not the case it's exactly one "BEEP" silence (based on the original source, whatever file I try to convert)
If you'll look on the image I posted on my first post you'll see why it doesn't make sense compare to the source, but at the same time... it is keeping the ratio of where the "voices" are supposed to appear.
When I do the same with ANY of my other models (Trained with the same way) they are working... so I don't know what happened to break this last model I trained... and I don't know how to reproduce the problem I just trained as I usually did before 🤔
I forgot to mention that from 3.10.0 to 3.10.5, so-vits-svc style ContentVec processing code was wrong, which could cause pronunciation problems if updated, but I am not sure if the beeps are related to this (as they usually seem to cause pronunciation problems.) (There is no way to determine if the model was created during these times unfortunately.)
I forgot to mention that from 3.10.0 to 3.10.5, so-vits-svc style ContentVec processing code was wrong, which could cause pronunciation problems if updated, but I am not sure if the beeps are related to this (as they usually seem to cause pronunciation problems.) (There is no way to determine if the model was created during these times unfortunately.)
I didn't update while I was training this specific model. So it used one version before the most latest, but latest when I started training, means: the version from 5-7 days ago I believe (sorry I can't be very accurate as I don't know for sure) but I do know that it wasn't the latest from few days ago if that information helps a bit.
I guess I will have to risk few more days to train from scratch as this model is ruined and the backups are also ruined so I can't save it and resume. I was hoping that this bug was known but I understand it is not so I'll have to explore if it happens again... I hope it won't happen again around 3000 - 3900 epochs 🙏