Sry2016 comments

Repositories
Issues
Comments

Results 5 comments of


                                            Sry2016

Is the gradient for Encoder doubled?

Hello ! Have you solved this problem?

Is the gradient for Encoder doubled?

You are right. So I think we should add """trainer_E.zero_grad()“”“ here but without trainer_E.step().

Which layers of BERT are used for MLM Loss?

> I have the same question. It seems that 'multi_modal' is selected with MLM loss, which all layers receive image features.

Which layers of BERT are used for MLM Loss?

> Hi, > > It seems from Figure 1 of the paper that only the last 6 of the 12 layers in the BERT are used for the MLM loss....

If we change the code in Model.py like this, the convergence speed would be faster.

epoch,loss,ppl,accuracy 0, 1.27207, 3.56825,73.010 1, 0.43369, 1.54293,91.913 2, 0.27970, 1.32274,95.256 3, 0.21871, 1.24447,96.444 4, 0.19118, 1.21067,97.124 5, 0.16422, 1.17847,97.475 6, 0.15239, 1.16461,97.674 7, 0.14655, 1.15784,97.768 8, 0.15024, 1.16211,97.772 9, 0.14270,...