3D-ResNets-PyTorch icon indicating copy to clipboard operation
3D-ResNets-PyTorch copied to clipboard

RuntimeError: CUDA error: an illegal memory access was encountered

Open ArchieGu opened this issue 6 years ago • 8 comments

Hi I trained the dataset on resnet 34 followed the given cmd in readme, and it ran well, but after I changed the model depth into 50 and 101, it gave me the error.

screen shot 2018-08-07 at 2 09 37 pm 2

ArchieGu avatar Aug 07 '18 06:08 ArchieGu

Hi, did you solve it?

junwenchen avatar Oct 09 '18 03:10 junwenchen

@junwenchen @ArchieGu i meet the same question when use depth 101,do you solve it

zys20172212265 avatar Nov 01 '18 05:11 zys20172212265

~~Hi, I get the same error message, did you solve it?~~ My Solution: it seems just running out of GPU memory

TengdaHan avatar Dec 22 '18 10:12 TengdaHan

Today, I encountered same issue and I solved by reducing batch size. I run 3D resnet architecture with 2 V100 GPUs, with batch size 16.

jangho2001us avatar Dec 29 '18 08:12 jangho2001us

You could try using --resnet_shortcut B for model depth other than 18 and 34.

Thank You

sumeetssaurav avatar Feb 13 '19 16:02 sumeetssaurav

Hi, I am having similar issue using another code. Any pointers on how to fix it? thanks.

File "../libs/bn.py", line 109, in forward
    self.training, self.momentum, self.eps, self.activation, self.slope)
  File "../libs/functions.py", line 99, in forward
    running_mean.mul_((1 - ctx.momentum)).add_(ctx.momentum * mean)
RuntimeError: CUDA error: an illegal memory access was encountered

When trying to print the value of the tensor running_mean (during the second call), it raises the following error:


print(running_mean)
  File "..../Venvs/pytorch.1.0.1/lib/python3.7/site-packages/torch/tensor.py", line 66, in __repr__
    return torch._tensor_str._str(self)
  File "..../Venvs/pytorch.1.0.1/lib/python3.7/site-packages/torch/_tensor_str.py", line 277, in _str
    tensor_str = _tensor_str(self, indent)
  File "..../Venvs/pytorch.1.0.1/lib/python3.7/site-packages/torch/_tensor_str.py", line 195, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "..../Venvs/pytorch.1.0.1/lib/python3.7/site-packages/torch/_tensor_str.py", line 84, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
  File "..../Venvs/pytorch.1.0.1/lib/python3.7/site-packages/torch/functional.py", line 271, in isfinite
    return (tensor == tensor) & (tensor.abs() != inf)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generated/../THCTensorMathCompareT.cuh:69

This issue seems machine-related.

Fix and possible explanation to the error.

sbelharbi avatar Mar 26 '19 11:03 sbelharbi

Over here they suggested:

============================================================

use system python, not conda:

/usr/bin/python3 -m venv venv

I encountered the same issue. Then the problem has been fixed after creating python environment by system python, NOT from anaconda. (I think he means from a virtualenv, rather than conda).

============================================================

Hope this solves your problem! Please reply if it does to help future readers

neonb88 avatar Aug 26 '19 08:08 neonb88

I also met this issue with pytorch=0.4.1 And I solved this problem by updating pytorch 0.4.1 to 1.2 hope this will help you.

IvanFei avatar Oct 11 '19 12:10 IvanFei