warp-transducer CUDA error: an illegal memory access was encountered

Hello, I'm facing the following error when using your package. It appears randomly after some epochs. Do you have an idea about where it could come from ?

File "main_rnnt.py", line 86, in <module>
    model.train()
  File "/gpfs1/dlocal/run/7027505/pytorch/rnnt/RNNT.py", line 174, in train
    batch_metrics = self.train_batch(x, y)
  File "/gpfs1/dlocal/run/7027505/pytorch/rnnt/RNNT.py", line 286, in train_batch
    loss = loss_func(pred, y.permute(1, 0).contiguous(), x_len, y_len)
  File "/gpfs1/home/2017018/dcoque01/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/gpfs1/home/2017018/dcoque01/pytorch/lib/python3.6/site-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 100, in forward
    return self.loss(acts, labels, act_lens, label_lens, self.blank, self.reduction)
  File "/gpfs1/home/2017018/dcoque01/pytorch/lib/python3.6/site-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 40, in forward
    grads /= minibatch_size
RuntimeError: CUDA error: an illegal memory access was encountered

CentOS-7 CUDA 10.0 python 3.6.9 torch 1.2 gcc 7.3.0 GPU : Tesla P100-PCIE-12GB

Mar 26 '20 11:03 FactoDeepLearning

Getting the same. Any fix? @FactoDeepLearning @HawkAaron

EDIT This was due to me not putting acts, labels, input_len, and label_len to .cuda() in pytorch. Fix now.

EDIT2 I'm still getting it now. It'll train at first then get this error after X iterations.

Mar 31 '20 21:03 LearnedVector

After some debugging, I think there might be a bug in this library @HawkAaron. I am printing the cost at this line here https://github.com/HawkAaron/warp-transducer/blob/master/pytorch_binding/warprnnt_pytorch/init.py#L37 and the RuntimeError: CUDA error: an illegal memory access was encountered only happens when cost is printing out as 0.. I am assuming that the loss_fn https://github.com/HawkAaron/warp-transducer/blob/master/pytorch_binding/warprnnt_pytorch/init.py#L27 is not updating the cost of gradients causing it to error out. Any ideas?

Also this issue is fixed when running on cpu and there are no 0 costs.

Apr 01 '20 21:04 LearnedVector

Same issues.

Apr 20 '20 04:04 funcwj

I think #64 will fix this issue.

Apr 22 '20 09:04 jaesong

My version is latest. When using warp-transducer in espnet, the error still exist as "CUDA error: an illegal memory access was encountered". I discuss it in espnet project. But they think it is a problem of transducer.

https://github.com/espnet/espnet/issues/1860#issuecomment-651040485

My warp-transducer version is as follows. Merge: c1a265f 5098002 Author: Mingkun Huang [email protected] Date: Mon Apr 27 23:07:35 2020 +0800

Merge pull request #66 from kamo-naoyuki/pt1.5

Support pytorch1.5

Jul 01 '20 09:07 housebaby

@housebaby which kind of GPU did you use?

Jul 01 '20 14:07 HawkAaron

@housebaby which kind of GPU did you use?

Tesla V100

It will not always fail. In some cases, either using 4 or 8 cards, it works. But when I just change the batch size of the successful case （ or learning_rate） , it fail. It is confusing

Jul 02 '20 03:07 housebaby

Same issue. When the batchsizes=3, it passed. When the batchsizes is set higher, it failed.

Jul 03 '20 02:07 oshindow

Oh, right, there's an overflow issue at compute_grad_kernel:

    // 0 <= col < batch * T * U
    int col = blockIdx.x;

    // col * alphabet_size can be > 2**31 - 1 = INT_MAX, but its type is int
    Tp logpk = denom[col] + acts[col * alphabet_size + idx];

cuda-memcheck seems to catch such problem with batch=1, src=53688, tgt=1+1, vocab=20000 (53688 * 2 * 20000 > INT_MAX). I also suspect that there are similar overflow issues at ReduceHelper, but I haven't checked them properly.

Jul 06 '20 08:07 jaesong

Oh, right, there's an overflow issue at compute_grad_kernel:
    // 0 <= col < batch * T * U
    int col = blockIdx.x;

    // col * alphabet_size can be > 2**31 - 1 = INT_MAX, but its type is int
    Tp logpk = denom[col] + acts[col * alphabet_size + idx];
cuda-memcheck seems to catch such problem with batch=1, src=53688, tgt=1+1, vocab=20000 (53688 * 2 * 20000 > INT_MAX). I also suspect that there are similar overflow issues at ReduceHelper, but I haven't checked them properly.

Cool . Then how should we solve this overflow problem. And will modification on this problem be updated to warp-transducer soon? @HawkAaron @jaesong

Jul 07 '20 07:07 housebaby

I don't know if this is related but after upgrading to Tensorflow 2.5.0 (and therefore to CUDA 11.1) I am seeing this when training RNN-based transducer models. The loss either gets nan or I see the following error:

2021-06-17 17:23:44.905116: E tensorflow/stream_executor/dnn.cc:729] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1990): 'cudnnRNNForwardTraining( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, input_desc.handles(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.params_handle(), params.opaque(), output_desc.handles(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), workspace.opaque(), workspace.size(), reserve_space.opaque(), reserve_space.size())'
2021-06-17 17:23:44.905169: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at cudnn_rnn_ops.cc:1560 : Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 768, 768, 1, 29, 41, 768]
2021-06-17 17:23:44.906664: I tensorflow/stream_executor/stream.cc:1404] [stream=0x55774c2eb680,impl=0x5577394acab0] did not wait for [stream=0x55774c2eb410,impl=0x5577266661f0]
2021-06-17 17:23:44.906810: E tensorflow/stream_executor/cuda/cuda_driver.cc:1085] could not wait stream on event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2021-06-17 17:23:44.906826: E tensorflow/stream_executor/cuda/cuda_driver.cc:1085] could not wait stream on event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2021-06-17 17:23:44.906841: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2021-06-17 17:23:44.906859: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:721] failed to record completion event; therefore, failed to create inter-stream dependency
2021-06-17 17:23:44.906872: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2021-06-17 17:23:44.906888: E tensorflow/stream_executor/stream.cc:334] Error recording event in stream: Error recording CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2021-06-17 17:23:44.906903: F tensorflow/core/common_runtime/device/device_event_mgr.cc:221] Unexpected Event status: 1
2021-06-17 17:23:44.906911: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2021-06-17 17:23:44.906920: E tensorflow/stream_executor/cuda/cuda_driver.cc:1202] failed to enqueue async memcpy from host to device: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; GPU dst: 0x7fec7589a700; host src: 0x7fec55458200; size: 4=0x4
2021-06-17 17:23:44.906934: F tensorflow/core/common_runtime/device/device_event_mgr.cc:221] Unexpected Event status: 1
Fatal Python error: Aborted2021-06-17 17:23:44.906946: E tensorflow/stream_executor/cuda/cuda_driver.cc:1202] failed to enqueue async memcpy from host to device: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; GPU dst: 0x7fed1b838100; host src: 0x7fe28e26b040; size: 24531156=0x17650d4


Thread 0x00007fec57a63700 (most recent call first):
  File "2021-06-17 17:23:44.906960: E tensorflow/stream_executor/cuda/cuda_driver.cc:1202] failed to enqueue async memcpy from host to device: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; GPU dst: 0x7fecaa6b1b00; host src: 0x7fec55457a00; size: 164=0xa4
/home/sfalk2021-06-17 17:23:44.906974: E tensorflow/stream_executor/cuda/cuda_driver.cc:1182] failed to enqueue async memcpy from device to host: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered; host dst: 0x7fec5545af00; GPU src: 0x7fe75f100d00; size: 31980=0x7cec
/minicon2021-06-17 17:23:44.906987: F tensorflow/core/common_runtime/device/device_event_mgr.cc:221] Unexpected Event status: 1
da3/Fatal Python error: eAborted nvs/asr2/lib/python3.8/multiprocessingFatal Python error: /Abortedpool.py"Aborted (core dumped)

It's possible that this has nothing to do with https://github.com/HawkAaron/warp-transducer but it's the only external library I am using in combination with Tensorflow.

See also https://github.com/tensorflow/tensorflow/issues/50326

Jun 17 '21 16:06 stefan-falk

Hi @stefan-falk, did you resolve the issue ? i have similar problem with tf 2.8.2 + cuda11.2 + warp+rnnt. issue occurs only on multiGPU

Aug 02 '22 13:08 yufang67

warp-transducer warp-transducer copied to clipboard

CUDA error: an illegal memory access was encountered

warp-transducer
warp-transducer copied to clipboard