Info-HCVAE icon indicating copy to clipboard operation
Info-HCVAE copied to clipboard

CUDA error: device-side assert triggered when "max_c_len" is setted to 1000 (bigger than default value 384)

Open zhanhl316 opened this issue 4 years ago • 1 comments

/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed. Epoch: 55%|?????????????????????????????????????????????????????????????????????????? | 11/20 [00:22<00:18, 2.07s/it] Traceback (most recent call last): File "main.py", line 136, in main(args) File "main.py", line 51, in main
metric_dict, bleu, rouge_1, rouge_2, _ = eval_vae(epoch, args, trainer, eval_data) File "/home/codes/Info-HCVAE/vae/eval.py", line 74, in eval_vae posterior_z_prob = trainer.generate_posterior(c_ids, q_ids, a_ids) File "/home/codes/Info-HCVAE/vae/trainer.py", line 49, in generate_posterior _, _, zq, _, za = self.vae.posterior_encoder(c_ids, q_ids, a_ids) File "/home/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/codes/Info-HCVAE/vae/models.py", line 199, in forward c_hs, c_state = self.encoder(c_embeddings, c_lengths) File "/home/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/codes/Info-HCVAE/vae/models.py", line 146, in forward batch_first=True, enforce_sorted=False) File "/home/.local/lib/python3.7/site-packages/torch/nn/utils/rnn.py", line 223, in pack_padded_sequence lengths = torch.as_tensor(lengths, dtype=torch.int64) RuntimeError: CUDA error: device-side assert triggered

Do you have any idea about this error ? Thank you! The only changing value is "max_c_len" (from 384(default) to 1000). It seems that this error is triggered by the increasing of "max_c_len"

zhanhl316 avatar Oct 19 '20 15:10 zhanhl316

The upper bound of max_c_len should be 512, because we use pretrained BERT for the encoders.

Since BERT has positional embedding up to the length 512, it does not work for the length strictly greater than 512.

seanie12 avatar Oct 20 '20 01:10 seanie12