Sry2016

Results 5 comments of Sry2016

Hello ! Have you solved this problem?

You are right. So I think we should add """trainer_E.zero_grad()“”“ here but without trainer_E.step().

> I have the same question. It seems that 'multi_modal' is selected with MLM loss, which all layers receive image features.

> Hi, > > It seems from Figure 1 of the paper that only the last 6 of the 12 layers in the BERT are used for the MLM loss....

epoch,loss,ppl,accuracy 0, 1.27207, 3.56825,73.010 1, 0.43369, 1.54293,91.913 2, 0.27970, 1.32274,95.256 3, 0.21871, 1.24447,96.444 4, 0.19118, 1.21067,97.124 5, 0.16422, 1.17847,97.475 6, 0.15239, 1.16461,97.674 7, 0.14655, 1.15784,97.768 8, 0.15024, 1.16211,97.772 9, 0.14270,...