open_flamingo
open_flamingo copied to clipboard
Training loss curve on MMC4 dataset?
Hi, thanks for the great work! I tried training on a subset of MMC4-core but the LM loss does not go down too much. Is it possible to share the MMC4 loss curve for reference, so that I may know if it is expected (or potentially a bug). Thanks so much!
Meet the same problem, it seems the training loss on MMC4 is hard to convergence.
Here are the loss plots for some of our training runs. We also find that the loss on MMC4 decreases more slowly than the loss on LAION. We anticipate that this could be the case because we use a pre-trained language model, which is already a strong predictor of the next token.
Thanks. What's the sim thresh score used for this figure?
These curves use a threshold of 0.24.
Thanks. In addition, have you ever show the validation loss before? If I use a subset of mmc4 for pretrain (around 1m website). The validation loss begin to rise up in very short iterations.
Train:
Val:
Hmm, we haven't plotted such a validation loss before -- this behavior is pretty surprising to me! Do you know if your downstream performance on task benchmarks improves or degrades with training?
The downstream performance are also unstable. The ckpt in middle sometimes better than the final. I guess it's because I train the model with only 1M LAION and 1M MMc4. The data scale is too small. How many LAION and MMC4 samples you used for the above figure? @i-gao
Ah, okay! The x-axis in the training curve plots refer to the number of interleaved (mmc4) samples.