vilio
vilio copied to clipboard
Error when running hm.py
Hi i wanted to run vilio for my experiment. I made a copy of fts_tsv/hm_data_tsv.py and updated HMTorchDataset class to only read from 1 single file (instead of splits). The pastebin is here. So after passing the data through the model (im using U) im getting this error :
Traceback (most recent call last):
File "hm_uniter.py", line 392, in <module>
main()
File "hm_uniter.py", line 361, in main
hm.train(hm.train_tuple, hm.valid_tuple)
File "hm_uniter.py", line 187, in train
logit = self.model(sent, (feats, boxes))
File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/darthgera123/vilio/entryU.py", line 200, in forward
seq_out, pooled_output = self.model(input_ids.cuda(), None, img_feats.cuda(), img_pos_feats.cuda(), attn_masks.cuda(), gather_index=gather_index.cuda())
File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 418, in forward
encoded_layers = self.encoder(
File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 304, in forward
hidden_states = layer_module(hidden_states, attention_mask)
File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 185, in forward
intermediate_output = self.intermediate(attention_output)
File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 158, in forward
hidden_states = self.intermediate_act_fn(hidden_states)
File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/functional.py", line 1369, in gelu
return torch._C._nn.gelu(input)
RuntimeError: CUDA error: device-side assert triggered
I havent changed any other file and online it says that solution is to fix the numbering in labelling (which I dont think) is the issue. This error comes when in entryU.py im running seq_out, pooled_output = self.model(input_ids.cuda(), None, img_feats.cuda(), img_pos_feats.cuda(), attn_masks.cuda(), gather_index=gather_index.cuda()).
Also im using uniter and bert-base
Please please help @Muennighoff
Error Trace after I added CUDA_LAUNCH_BLOCKING=1 ahead of python hm.py as the error where I was getting was deterministic.
@darthgera123 sorry for the late reply! This is most likely a shape mismatch - What are the dimensions of your images / are you using the HM Dataset?