densecap icon indicating copy to clipboard operation
densecap copied to clipboard

Question about training caption model

Open Naren00 opened this issue 5 years ago • 0 comments

Hello, I have a question about training cap_model with end-to-end maksed transformer.

In the code, the cap_model is trained with "window_mask = (gate_scores * pred_bin_window_mask.view(B, T, 1)". As I understand, the pred_bin_window_mask is extracted by prediction.

Therefore, Is caption model(cap_model) trained on the learned_proposal (not GT with label) ?? Is it right what I understand?

And, If cap model is trained on leanred_proposal, the model can be affected a lot defending o n the initial performance of learned_proposal. Therefore, it seems like to show unstable learning. If you have any misunderstandings, please point out that.

Thank you.

Naren00 avatar Oct 16 '20 08:10 Naren00