LAVIS
LAVIS copied to clipboard
About itm loss
Thank you for your code! When I reproduce the stage 1 trainging, I find that the itm loss does not convergent, is it normal? Or is there any trick? (note: I replace the bert with xlmr model)
No trick here. You may try lower learning rate, or use cleaner datasets.
When I pretrain BLIP on chinese dataset, I also met this question, the ITM and LM loss do not converge. Have you solve this problem? @qibao77
@qibao77 We use customised implementation for the mixture of encoder-decoder (med.py) model. It has a different architecture to that of bert, even though it is initialized from bert weights. If xlmr is used, there needs to be a customised implementation as well.
@chenyzh28 My ITM loss has converged, but it doesn't seem to work, and I'm still working on it.
@qibao77 It seems that the difference between Chinese and English BERT did cause my problem. I lowered the learning rate and ITM and LM are currently converging normally. I test the model with ITM score and it outperforms the ITC significantly. I hope it helps you.
can you please share more detail here? @chenyzh28
by Chinese bert, do you mean that you change the vocab or something else? what the new learning rate did you use to make it converge?