image-captioning-DLCT Questions about h5py.File features on customized images!

Hi, in your coding,h5py.File features has keys like ['%d_features' % image_id] , ['%d_grids' % image_id], ['%d_boxes' % image_id], ['%d_size' % image_id], ['%d_mask' % image_id], can you explain the five keys meaning？
I know you extract the grid features on 'https://github.com/facebookresearch/grid-feats-vqa', can you share the other keys(features, boxes,size,mask) extraction method or how to obtain them? Thanks! hope you reply soon！

Feb 03 '21 10:02 liman13552763129

Thanks for your interest. As you can see, there are five kinds of keys in our .hdf5 file. They are

['%d_features' % image_id]: region features (N_regions, feature_dim)
['%d_boxes' % image_id]: bounding box of region features (N_regions, 4)
['%d_size' % image_id]: size of original image (for normalizing bounding box), (2,)
['%d_grids' % image_id]: grid features (N_grids, feature_dim)
['%d_mask' % image_id]: geometric alignment graph, (N_regions, N_grids)

The first three keys can be obtained when extracting region features. The last key can be obtained according to geometric relationship between grid features and region features. I will upload the extraction method and update our README to explain it more clearly. Thanks for your asking!

Feb 08 '21 02:02 luo3300612

Hi, the forth key ['%d_grids' % image_id]: grid features (N_grids, feature_dim), where the dim N_grids is 77 and feature_dim is 2048? The grids features dim obtained in grid-feats-vqa is torch.Size([1, 2048, 26, 19]) ,after using torch.nn.AdaptiveAvgPool2d((7, 7)) the features dim is torch.Size([1, 2048, 7, 7]). To get torch.Size([77, 2048]) grid features from torch.Size([1, 2048, 7, 7]) (marked as A), I think it need three steps:

firstly: using torch.squeeze(A) to get torch.Size([2048, 7, 7]) (marked as B)
secondly: B.reshape([2048, 7*7])(marked as C)
thirdly: C.transpose(0,1) get the torch.Size([7*7, 2048])

the conversion above is right? If it's not right, how it that?

Mar 01 '21 08:03 liman13552763129

@luo3300612 hi， hope you reply soon！ thank you very much!

Mar 10 '21 06:03 liman13552763129

yes, it is right

Mar 10 '21 06:03 luo3300612

thanks!

Mar 10 '21 06:03 liman13552763129

@luo3300612 Hi，when I start to optimize the model with CIDEr reward with 5*10-6 learning rate, the loss is 0 at begining , after several batch chage negative value, in your code the loss is: loss = -torch.mean(log_probs, -1) * (reward - reward_baseline) I print the log_probs,reward and reward_baseline, the log_probs is negative value and others is positive value.Is it right? And why causes this phenomenon（loss is is negative value）？ Hope you reply soon！ thank you very much!

Mar 12 '21 03:03 liman13552763129

@luo3300612 hi，sorry bother you again, hope you reply for the above question, thank you very much!

Mar 15 '21 06:03 liman13552763129

@luo3300612 您好，再次打扰您，希望您回复以上问题，非常感谢！

您好，我想咨询您一下，您有相关代码去生成一个图像的描述像论文中的图1或者图5吗？如果有的话能和我分享一下吗？谢谢！！

Mar 13 '22 13:03 z972778371

Hi，I wonder how to make my own datasets，could you provide me with the script to extract the five keys ? @liman13552763129

Mar 15 '22 06:03 cxy990729

Hi！Could you share the copy. h5py file you downloaded earlier? Now the zip file in the link is damaged? In addition, if I want to operate on my own dataset, how do I get this h5py file? @luo3300612 @liman13552763129 @cxy990729

Feb 03 '23 06:02 YinghuaYa