Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

Is it expected that v2 trained models underperform compared to v1 trained models on identical settings?

Open kalomaze opened this issue 1 year ago • 6 comments

v1_vs_v2_Examples.zip

5 minute dataset. I also used the Mangio fork which adds 'crepe' as a training option for both of these models. Maybe for a model with a bigger dataset like 40 minutes, v2 outperforms?

kalomaze avatar May 16 '23 16:05 kalomaze

I have been informed it is much easier to 'overtrain' v2 models. Both are 1000 epochs. Will look into that first, but I'm running into an issue where TensorBoard cuts out at 6k steps, when the full model in theory has 20+ thousand steps, due to a batch size of 4 with a dataset of about ~5 minutes (so roughly ~100 post-processing wavs)

kalomaze avatar May 16 '23 20:05 kalomaze

Try both 20epoch model? Also you can try the latest v2 model pretrained weights. Now it's an experimental model and therefore performence may be decrease in some cases. We won't publish the release until it's tested completely.

RVC-Boss avatar May 17 '23 02:05 RVC-Boss

The "v2" files here are using the experimental v2 weights. The "v1" example is the normally trained weights model.

According to TensorBoard, the model is 6,000 steps. This may be innaccurate and it is improperly reading the logs? (1000 epochs, 5 minute dataset, batch size 4). I am told that, if /g/total starts rising again on the graph after peaking, it is a sign the model is overtraining. If that is correct, 300-400 epochs was the correct ending point, and 1000 epochs is far too much (for the v2 model only). I can see this being correct for such a small dataset. This person also told me that they saw this overtraining effect start at around 1000 epochs on a 40 minute dataset (again, just on v2).

image

With this in mind, the 350 epochs model seems to beat the 1000 epochs v1 model based on my first attempt, but more testing is needed. It seems that it is much easier to overtrain a v2 model

v1_1000_EPOCHS_VS_v2_350_EPOCHS.zip

kalomaze avatar May 17 '23 04:05 kalomaze

The 350e_v2 version is significantly better than 1000e_v2 (1st floor) version.

RVC-Boss avatar May 17 '23 09:05 RVC-Boss

In the earliest version, the max total_epoch is 100, than updated to 200. Somebody says more epoch may be better, needs 1000epoch, so now max total_epoch is 1000. But I think less epoch training is the advantage of RVC.

RVC-Boss avatar May 17 '23 09:05 RVC-Boss

image Something is definitely wrong with the steps count on that model. I trained a new one today for a different (8 minute) dataset. This is 600 epochs and it's 30k steps.... 5k is absolutely not the real number of the other one. Do you know what might cause this? Also, this new '30k steps' model defenitely gets beaten out by the version 1 equivalent, so I don't know if there's a better way to tell via tensorboard statistics if the model is overtraining.

v1_vs_v2_Mord.zip

kalomaze avatar May 17 '23 13:05 kalomaze