vilio icon indicating copy to clipboard operation
vilio copied to clipboard

Error when running hm.py

Open darthgera123 opened this issue 4 years ago • 2 comments

Hi i wanted to run vilio for my experiment. I made a copy of fts_tsv/hm_data_tsv.py and updated HMTorchDataset class to only read from 1 single file (instead of splits). The pastebin is here. So after passing the data through the model (im using U) im getting this error :

Traceback (most recent call last):
  File "hm_uniter.py", line 392, in <module>
    main()
  File "hm_uniter.py", line 361, in main
    hm.train(hm.train_tuple, hm.valid_tuple)
  File "hm_uniter.py", line 187, in train
    logit = self.model(sent, (feats, boxes))
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/darthgera123/vilio/entryU.py", line 200, in forward
    seq_out, pooled_output = self.model(input_ids.cuda(), None, img_feats.cuda(), img_pos_feats.cuda(), attn_masks.cuda(), gather_index=gather_index.cuda())
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 418, in forward
    encoded_layers = self.encoder(
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 304, in forward
    hidden_states = layer_module(hidden_states, attention_mask)
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 185, in forward
    intermediate_output = self.intermediate(attention_output)
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/darthgera123/vilio/src/vilio/modeling_bertU.py", line 158, in forward
    hidden_states = self.intermediate_act_fn(hidden_states)
  File "/home/darthgera123/anaconda3/envs/vilio/lib/python3.8/site-packages/torch/nn/functional.py", line 1369, in gelu
    return torch._C._nn.gelu(input)
RuntimeError: CUDA error: device-side assert triggered

I havent changed any other file and online it says that solution is to fix the numbering in labelling (which I dont think) is the issue. This error comes when in entryU.py im running seq_out, pooled_output = self.model(input_ids.cuda(), None, img_feats.cuda(), img_pos_feats.cuda(), attn_masks.cuda(), gather_index=gather_index.cuda()). Also im using uniter and bert-base Please please help @Muennighoff

darthgera123 avatar May 10 '21 20:05 darthgera123

Error Trace after I added CUDA_LAUNCH_BLOCKING=1 ahead of python hm.py as the error where I was getting was deterministic.

darthgera123 avatar May 10 '21 21:05 darthgera123

@darthgera123 sorry for the late reply! This is most likely a shape mismatch - What are the dimensions of your images / are you using the HM Dataset?

Muennighoff avatar May 22 '21 06:05 Muennighoff