PURE
PURE copied to clipboard
RuntimeError: CUDA error: device-side assert triggered
Hi, I'd run into "RuntimeError: CUDA error: device-side assert triggered" error when I attempted to run your code on a Chinese dataset. The log is as follows:
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [165,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [165,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
File "run_entity.py", line 225, in <module>
output_dict = model.run_batch(train_batches[i], training=True)
File "/tf_group/lihongyu/PURE-main/entity/models.py", line 302, in run_batch
attention_mask = attention_mask_tensor.to(self._model_device),
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/tf_group/lihongyu/PURE-main/entity/models.py", line 65, in forward
spans_embedding = self._get_span_embeddings(input_ids, spans, token_type_ids=token_type_ids,
attention_mask=attention_mask)
File "/tf_group/lihongyu/PURE-main/entity/models.py", line 41, in _get_span_embeddings
sequence_output, pooled_output = self.bert(input_ids=input_ids, token_type_ids=token_type_ids,
attention_mask=attention_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py", line 752, in forward
input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_bert.py", line 181, in forward
embeddings = inputs_embeds + position_embeddings + token_type_embeddings
RuntimeError: CUDA error: device-side assert triggered
I have searched a few cases of this error on stack overflow, but still fail to make out what has happened. I drew the dimension of inputs_embeds, position_embeddings, token_type_embeddings, and it seemed to be nothing wrong(all of these is of [1, seq_len(>350), 768]) Thanks for your time.
I don't think there's something to do with the data I used. But I would one piece here.
{"clusters": [[]], "sentences": [["攀", "谈", "中", "我", "了", "解", "到", "衣", "裙", "出", "她", "的", "手", ",", "一", "针", "一", "线", "、", "一", "花", "一", "朵", "都", "是", "田", "边", "地", "角", "劳", "动", "之", "余", "飞", "针", "走", "线", "绣", "成", "的", "。"]], "ner": [[[7, 8, "Thing"], [10, 10, "Person"], [14, 22, "Thing"], [25, 28, "Location"]]], "relations": [[[10, 10, 7, 8, "Create"], [14, 22, 7, 8, "Part-Whole"]]], "doc_key": "dev.json_9"}
Maybe the reason could be found in https://discuss.pytorch.org/t/solved-assertion-srcindex-srcselectdimsize-failed-on-gpu-for-torch-cat/1804/15. But I still have no idea~
Hi! Have you tried to run our pre-trained models? I have never run into this issue before. I am wondering whether this is due to version mismatching of some libraries.
Hi, thank you for your reply. I have just made out what had happened. It was because some of my instances are too long. I discarded the sentences over 512 (I don't know the exact number), and it worked.
Hi, would you please modify the code of file run_relation_approx.py and make it more friendly to max_seq_length? Unlike those lines in run_relation.py, I found nothing was done for sequences with token number > max_seq_length.
In run_relation_approx.py
line 154:
assert(num_tokens + 4 <= max_seq_length)
In run_relation.py
line 114~119:
if len(tokens) > max_seq_length:
tokens = tokens[:max_seq_length]
if sub_idx >= max_seq_length:
sub_idx = 0
if obj_idx >= max_seq_length:
obj_idx = 0