Info-HCVAE
Info-HCVAE copied to clipboard
CUDA error: device-side assert triggered when "max_c_len" is setted to 1000 (bigger than default value 384)
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize
failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [57,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize
failed.
Epoch: 55%|?????????????????????????????????????????????????????????????????????????? | 11/20 [00:22<00:18, 2.07s/it]
Traceback (most recent call last):
File "main.py", line 136, in
metric_dict, bleu, rouge_1, rouge_2, _ = eval_vae(epoch, args, trainer, eval_data)
File "/home/codes/Info-HCVAE/vae/eval.py", line 74, in eval_vae
posterior_z_prob = trainer.generate_posterior(c_ids, q_ids, a_ids)
File "/home/codes/Info-HCVAE/vae/trainer.py", line 49, in generate_posterior
_, _, zq, _, za = self.vae.posterior_encoder(c_ids, q_ids, a_ids)
File "/home/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/codes/Info-HCVAE/vae/models.py", line 199, in forward
c_hs, c_state = self.encoder(c_embeddings, c_lengths)
File "/home/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/codes/Info-HCVAE/vae/models.py", line 146, in forward
batch_first=True, enforce_sorted=False)
File "/home/.local/lib/python3.7/site-packages/torch/nn/utils/rnn.py", line 223, in pack_padded_sequence
lengths = torch.as_tensor(lengths, dtype=torch.int64)
RuntimeError: CUDA error: device-side assert triggered
Do you have any idea about this error ? Thank you! The only changing value is "max_c_len" (from 384(default) to 1000). It seems that this error is triggered by the increasing of "max_c_len"
The upper bound of max_c_len should be 512, because we use pretrained BERT for the encoders.
Since BERT has positional embedding up to the length 512, it does not work for the length strictly greater than 512.