ImageCaptioning.pytorch
ImageCaptioning.pytorch copied to clipboard
why don't you add <start> and <end> to the encoding ground truth?
Firstly, thank you very much for your very useful Repo. However, I have a question that why don't you add 'start' and 'end' to the encoding ground truth like other methods? So, does the model konw how to end the decode phase?
0 is.
Thank you for your prompt reply! And I have another question that you said the resulting files are about 200GB, but the feats_att.h5 I extracted via your code is 396.0 GB (395,951,570,760 bytes). Why?