open_flamingo icon indicating copy to clipboard operation
open_flamingo copied to clipboard

Training loss curve on MMC4 dataset?

Open tonylins opened this issue 2 years ago • 8 comments

Hi, thanks for the great work! I tried training on a subset of MMC4-core but the LM loss does not go down too much. Is it possible to share the MMC4 loss curve for reference, so that I may know if it is expected (or potentially a bug). Thanks so much!

tonylins avatar Jul 22 '23 08:07 tonylins

Meet the same problem, it seems the training loss on MMC4 is hard to convergence.

FingerRec avatar Jul 24 '23 02:07 FingerRec

Screenshot 2023-07-23 at 9 09 57 PM

Here are the loss plots for some of our training runs. We also find that the loss on MMC4 decreases more slowly than the loss on LAION. We anticipate that this could be the case because we use a pre-trained language model, which is already a strong predictor of the next token.

anas-awadalla avatar Jul 24 '23 04:07 anas-awadalla

Thanks. What's the sim thresh score used for this figure?

FingerRec avatar Jul 24 '23 15:07 FingerRec

These curves use a threshold of 0.24.

i-gao avatar Jul 24 '23 16:07 i-gao

Thanks. In addition, have you ever show the validation loss before? If I use a subset of mmc4 for pretrain (around 1m website). The validation loss begin to rise up in very short iterations.

Train: image

Val: image

FingerRec avatar Jul 25 '23 02:07 FingerRec

Hmm, we haven't plotted such a validation loss before -- this behavior is pretty surprising to me! Do you know if your downstream performance on task benchmarks improves or degrades with training?

i-gao avatar Jul 26 '23 07:07 i-gao

The downstream performance are also unstable. The ckpt in middle sometimes better than the final. I guess it's because I train the model with only 1M LAION and 1M MMc4. The data scale is too small. How many LAION and MMC4 samples you used for the above figure? @i-gao

FingerRec avatar Jul 27 '23 02:07 FingerRec

Ah, okay! The x-axis in the training curve plots refer to the number of interleaved (mmc4) samples.

i-gao avatar Jul 27 '23 03:07 i-gao