gail-pytorch icon indicating copy to clipboard operation
gail-pytorch copied to clipboard

CUDA error: no kernel image is available for execution on the device

Open Ishihara-Masabumi opened this issue 1 year ago • 6 comments

When I run the train.py script along with your instruction, the following error occurred.

python3 train.py --env_name=BipedalWalker-v3
/home/dl/GAIL/GAIL/lib/python3.8/site-packages/torch/cuda/__init__.py:146: UserWarning: 
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "train.py", line 98, in <module>
    main(**vars(args))
  File "train.py", line 54, in main
    expert = Expert(
  File "/home/dl/GAIL/gail-pytorch/models/nets.py", line 120, in __init__
    self.pi = PolicyNetwork(self.state_dim, self.action_dim, self.discrete)
  File "/home/dl/GAIL/gail-pytorch/models/nets.py", line 18, in __init__
    Linear(state_dim, 50),
  File "/home/dl/GAIL/GAIL/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 101, in __init__
    self.reset_parameters()
  File "/home/dl/GAIL/GAIL/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 107, in reset_parameters
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
  File "/home/dl/GAIL/GAIL/lib/python3.8/site-packages/torch/nn/init.py", line 412, in kaiming_uniform_
    return tensor.uniform_(-bound, bound)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Please let me know how to fix it.

Ishihara-Masabumi avatar Mar 06 '23 05:03 Ishihara-Masabumi

The reason for this error is due to the capability between your GPU machine and the PyTorch version.

hcnoh avatar Mar 06 '23 12:03 hcnoh

I don't think so, because other deeplearning model can run on my environment.

Ishihara-Masabumi avatar Mar 08 '23 05:03 Ishihara-Masabumi

Then I can't find the reason for that error. This error seems to be due to your machine environment. You have to check by yourself.

hcnoh avatar Mar 08 '23 06:03 hcnoh

do you have solved it? i met same issue

dudulry avatar Mar 09 '23 04:03 dudulry

Not yet.

Ishihara-Masabumi avatar Mar 09 '23 20:03 Ishihara-Masabumi

Have you used APEX?I encountered this error after using APEX. I fixed it by creating a new environment with RTX3090, CUDA 11.6, Python 3.7, and Torch 1.12.now it can work.

dudulry avatar Mar 10 '23 03:03 dudulry