CogQA icon indicating copy to clipboard operation
CogQA copied to clipboard

RuntimeError: CUDA error: an illegal memory access was encountered

Open ditingdapeng opened this issue 4 years ago • 12 comments

In the process of train, encountered such a mistake, where is the problem? "RuntimeError: CUDA error: an illegal memory access was encountered"

ditingdapeng avatar Sep 25 '20 10:09 ditingdapeng

Hi, this seems to be caused by some other problems(from the environments), could you provide more information?

Sleepychord avatar Sep 25 '20 11:09 Sleepychord

Thank you ! This is my conda and torch version configuration: python 3.7.0 conda 4.5.11 torch 1.0.1.post2 torchvision 0.2.2.post3

Hi, this seems to be caused by some other problems(from the environments), could you provide more information?

ditingdapeng avatar Sep 25 '20 11:09 ditingdapeng

The batch_size in the train.py has been transferred to 1, My machine is:2080Ti

ditingdapeng avatar Sep 25 '20 11:09 ditingdapeng

Hi, can you tell me which code raise the error? seems like the environment is okay.

Sleepychord avatar Sep 27 '20 07:09 Sleepychord

Yeah ! "batch = tuple(t.to(device) for t in batch)" , I've now reinstalled the Ubantu environment, May I ask if your VERSION of CUDA must be 8?

ditingdapeng avatar Sep 27 '20 07:09 ditingdapeng

I have been stuck with this problem for 3 days and have ruled out memory overflow and batch_size. I couldn't resist reinstalling the system yesterday, and I noticed that the CUDA version didn't fit.

How many VERSIONS of CUDA do you have? Thank you ~

ditingdapeng avatar Sep 27 '20 07:09 ditingdapeng

I suspect the problem is that CudA10.0 doesn't match the torch in the code

ditingdapeng avatar Sep 27 '20 07:09 ditingdapeng

No, but you need to ensure your torch build to fit for the CUDA version.

Sleepychord avatar Sep 27 '20 07:09 Sleepychord

soga. I think I know what the problem is where. My CUDA version follows your requirements, but Cuda may not match the torch

ditingdapeng avatar Sep 27 '20 07:09 ditingdapeng

Cuda loaded 8.0 has collapsed, I'm going to reinstall the system and press the new CUDA and Torch versions

ditingdapeng avatar Sep 27 '20 07:09 ditingdapeng

Hello! I think I finally found the problem, but I don’t know how to solve it. Hope to get your help.

The problem appears in the train.py file: ' hop_loss, ans_loss, pooled_output = model1(*batch)'

The error suggested is:RuntimeError: CUDA error: an illegal memory access was encountered.

I suspect that the parameter range of Model1 is different from the size of batch, Could you please help me explain the structure of model1, thank you very much!!!

ditingdapeng avatar Oct 05 '20 12:10 ditingdapeng

I have found the problem. It is because my CUDA environment is not well installed. So much trouble for you~

ditingdapeng avatar Oct 10 '20 10:10 ditingdapeng