self-critical.pytorch
self-critical.pytorch copied to clipboard
Inference on model is failing when there is not detected object in bottom up features
For my current evaluation I am using my own computed bottom-up features. In these bottom-up features some of the images have zero detected regions. When I run these bottom-up features on your repo, it fails for the batches where such an image is present. Giving below error:
Traceback (most recent call last):
File "eval.py", line 176, in <module>
vars(opt))
File "/home/default/ephemeral_drive/work/image_captioning/object_relation_transformer_cloned/eval_utils.py", line 141, in eval_split
seq = model(fc_feats, att_feats, att_masks, opt=eval_kwargs, mode='sample')[0].data
File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/default/ephemeral_drive/work/image_captioning/object_relation_transformer_cloned/models/CaptionModel.py", line 31, in forward
return getattr(self, '_'+mode)(*args, **kwargs)
File "/home/default/ephemeral_drive/work/image_captioning/object_relation_transformer_cloned/models/TransformerModel.py", line 505, in _sample
return self._sample_beam(fc_feats, att_feats, att_masks, opt)
File "/home/default/ephemeral_drive/work/image_captioning/object_relation_transformer_cloned/models/TransformerModel.py", line 427, in _sample_beam
att_feats, seq, att_masks, seq_mask = self._prepare_feature(att_feats, att_masks)
File "/home/default/ephemeral_drive/work/image_captioning/object_relation_transformer_cloned/models/TransformerModel.py", line 347, in _prepare_feature
att_feats = pack_wrapper(self.att_embed, att_feats, att_masks)
File "/home/default/ephemeral_drive/work/image_captioning/object_relation_transformer_cloned/models/AttModel.py", line 43, in pack_wrapper
packed, inv_ix = sort_pack_padded_sequence(att_feats, att_masks.data.long().sum(1))
File "/home/default/ephemeral_drive/work/image_captioning/object_relation_transformer_cloned/models/AttModel.py", line 31, in sort_pack_padded_sequence
tmp = pack_padded_sequence(input[indices], sorted_lengths, batch_first=True)
File "/usr/local/lib64/python3.6/site-packages/torch/nn/utils/rnn.py", line 244, in pack_padded_sequence
_VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: Length of all samples has to be greater than 0, but found an element in 'lengths' that is <= 0
Terminating BlobFetcher
In the batch that cause the error, the fifth image in the batch had zero detections.
I am yet not sure if the error is because of the images with no detected regions, but I wanted to ask if there a requirement that the images should have atleast one detected region. If there is such a requirement, if would be great if you can give me a direction on which part of your code should be amended to enable the model to process such images or skip such images.
You can replace pack_wrapper with:
def pack_wrapper(module, att_feats, att_masks):
return module(att_feats)
Under default setting, there is no difference.
When you mention under default settings, does that mean default settings for Transformer and can it be an issue for LSTMs etc.? I understand that the using sort_pack_padded_sequence
and pad_unsort_packed_sequence
functions help in increasing the performance of the LSTMs and do not affect the logic. Is that correct?
I tried to understand what the function pack_wrapper
is trying to do. To deal with the situation where the batch has some images with no detected objects, I have added zero tensors in the output tensor for the images with no detected objects. Below is my code
def pack_wrapper(module, att_feats, att_masks):
if att_masks is not None:
import ipdb; ipdb.set_trace()
# True when image has detected regions
boolmask = att_feats.sum((1,2)) == 0
if boolmask.sum() != 0:
tmp_feats = att_feats[att_feats.sum((1,2))!=0]
tmp_masks = att_masks[att_masks.sum(1)!=0]
packed, inv_ix = sort_pack_padded_sequence(tmp_feats, tmp_masks.data.long().sum(1))
processed_feats = pad_unsort_packed_sequence(PackedSequence(module(packed[0]), packed[1]), inv_ix)
processed_feats_shape = processed_feats.shape
result_vector = torch.empty([att_feats.shape[0]] + [ele for ele in processed_feats_shape[1:]])
ii, jj = 0, 0
for bb in boolmask:
if not bb:
result_vector[ii] = processed_feats[jj]
jj += 1
else:
result_vector[ii] = torch.zeros(processed_feats_shape[1:])
ii += 1
return result_vector
else:
packed, inv_ix = sort_pack_padded_sequence(att_feats, att_masks.data.long().sum(1))
return pad_unsort_packed_sequence(PackedSequence(module(packed[0]), packed[1]), inv_ix)
else:
return module(att_feats)
Can you suggest if the edited code seems fine? If the edits don't seem fine or break some other aspect of the code, is doing return module(att_feats)
advisable for all the batches of images or the batch which has images with no detected object.
The pad and unpad here has nothing to do with LSTM. The reason I have it, is because at some point I have a batchnorm layer in the att_embed and I don't want zero tensors affect the statistics so I unpad it to run the att_embed and pad it back after.
I would suggest just do return module(att_feats)