VoiceCraft icon indicating copy to clipboard operation
VoiceCraft copied to clipboard

Validation loss Divergence?

Open ghost opened this issue 1 year ago • 6 comments

스크린샷 2024-03-28 오전 10 49 47

Thanks for your great work! Now, I'm training 100M voicecraft using ljspeech and custom data(32 hours Maybe ?) But, I faced a issue about validation loss divergence.

I think the cause it delay stacking which changed the sequnece every epoch described in your paper. If the train-accuracy of all 4 codebooks reaches 1, it is predicted that the validation loss will decrease..

For this reason, I have two questions.

  • Could you explain whether my training is right or not ? ( loss curve and analysis etc..)
  • Could you share your train and validation curve ?

Best regards

Seung Woo Yu

ghost avatar Mar 28 '24 02:03 ghost

Thanks

Note sure about the validation loss.

The model is overfitting on your data (if there isn't significantly train/val domain mismatch): it's getting 95+% training top10acc on codebook 1, but 30% on validation set.

I might no longer have the original curve on our server. But similar experiments I've done recently show that I get at best 60% top10acc on codebook 1 on training set, and similar for validation set (a little lower than training set because train-val mismatch in gigaspeech)

maybe try a even smaller model.

jasonppy avatar Mar 28 '24 02:03 jasonppy

@jasonppy thanks a lot for your answers. Do you have a rough idea of what a good validation loss to attain is? For example if you know roughly where you ended up on the 9k hours GigaSpeech experiment you quote in the paper. Thanks!

rlenain avatar Mar 28 '24 15:03 rlenain

I still have the results for a slightly different model, but should mostly be the same: 'train_top10acc_cb1': '0.5548 (0.5261)', 'train_top10acc_cb2': '0.4790 (0.4456)', 'train_top10acc_cb3': '0.4369 (0.3947)', 'train_top10acc_cb4': '0.3694 (0.3226)' 'val_top10acc_cb1': '0.5001731514930725', 'val_top10acc_cb2': '0.4425261914730072', 'val_top10acc_cb3': '0.4081890881061554', 'val_top10acc_cb4': '0.3555351495742798'

jasonppy avatar Mar 28 '24 22:03 jasonppy

thats super useful, thanks very much!

rlenain avatar Mar 29 '24 09:03 rlenain

@yuseungwoo I am seeing almost the same results as you are describing, ie great train loss, but almost instantly overfitting on the val_loss. Were you able to solve your problems? How big is your dataset?

peregilk avatar Apr 10 '24 17:04 peregilk

I still have the results for a slightly different model, but should mostly be the same: 'train_top10acc_cb1': '0.5548 (0.5261)', 'train_top10acc_cb2': '0.4790 (0.4456)', 'train_top10acc_cb3': '0.4369 (0.3947)', 'train_top10acc_cb4': '0.3694 (0.3226)' 'val_top10acc_cb1': '0.5001731514930725', 'val_top10acc_cb2': '0.4425261914730072', 'val_top10acc_cb3': '0.4081890881061554', 'val_top10acc_cb4': '0.3555351495742798'

Could you share the training curve or the accuracy related to 830M model? I tried to pretrain using my custom dataset, but it diverges in the middle of training 😂

stgzr avatar Jun 26 '24 10:06 stgzr