Why does the DecoderWithAttention still need encoded captions in forwarding when validating

Open ssheroes opened this issue 4 years ago • 1 comments

Thanks for your work first! learned a lot. In the forward function of DecoderWithAttention, I see at each output turn, the LSTMcell need an embedded encoded_captions, which is a supervised input. This can be understand in training, but in validating process, the place of the real embedded encoded caption should not be taken by the predicted caption last turn?
I don't know where the problem is? h, c = self.decode_step( torch.cat([embeddings[:batch_size_t, t, :], attention_weighted_encoding], dim=1), (h[:batch_size_t], c[:batch_size_t])) # (batch_size_t, decoder_dim)

Apr 16 '21 09:04 ssheroes

I don't know if it is too late to comment...

I guess this is only for evaluation. If you check their code caption.py for inference, they used the embedding generated by the model in inference time only instead.

The validation evaluation is only used to be compared with the one in training time and select a model.

Feb 13 '23 23:02 quanpr