fsdl-text-recognizer-2021-labs
fsdl-text-recognizer-2021-labs copied to clipboard
RuntimeError: CUDA error: no kernel image is available for execution on the device
System:
- WSL2
GPU: 3080
python training/run_experiment.py --model_class=MLP --data_class=MNIST --max_epochs=5 --gpus=-1
Followed mentioned steps but ended up with this error
RuntimeError: CUDA error: no kernel image is available for execution on the device
Complete output
GPU available: True, used: True
TPU available: None, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/cuda/__init__.py:104: UserWarning:
GeForce RTX 3080 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the GeForce RTX 3080 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
| Name | Type | Params
-------------------------------------------
0 | model | MLP | 936 K
1 | model.dropout | Dropout | 0
2 | model.fc1 | Linear | 803 K
3 | model.fc2 | Linear | 131 K
4 | model.fc3 | Linear | 1.3 K
5 | train_acc | Accuracy | 0
6 | val_acc | Accuracy | 0
7 | test_acc | Accuracy | 0
-------------------------------------------
936 K Trainable params
0 Non-trainable params
936 K Total params
/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:49: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 20 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
Validation sanity check: 0it [00:00, ?it/s]Traceback (most recent call last):
File "training/run_experiment.py", line 90, in <module>
main()
File "training/run_experiment.py", line 85, in main
trainer.fit(lit_model, datamodule=data)
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit
results = self.accelerator_backend.train()
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 66, in train
results = self.train_or_test()
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 69, in train_or_test
results = self.trainer.train()
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 495, in train
self.run_sanity_check(self.get_model())
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 693, in run_sanity_check
_, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 609, in run_evaluation
output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 178, in evaluation_step
output = self.trainer.accelerator_backend.validation_step(args)
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 84, in validation_step
return self._step(self.trainer.model.validation_step, args)
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 76, in _step
output = model_step(*args)
File "/mnt/c/Users/user/GitHub/fsdl-text-recognizer-2021-labs/lab1/text_recognizer/lit_models/base.py", line 58, in validation_step
logits = self(x)
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/c/Users/user/GitHub/fsdl-text-recognizer-2021-labs/lab1/text_recognizer/lit_models/base.py", line 45, in forward
return self.model(x)
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/c/Users/user/GitHub/fsdl-text-recognizer-2021-labs/lab1/text_recognizer/models/mlp.py", line 37, in forward
x = self.fc1(x)
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward
return F.linear(input, self.weight, self.bias)
File "/home/myuser/miniconda3/envs/fsdl-text-recognizer-2021/lib/python3.6/site-packages/torch/nn/functional.py", line 1690, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: CUDA error: no kernel image is available for execution on the device
Currently getting around it with
- Change cuda version in
environment.yml
- Remove cudnn line from
environment.yml
- After setting the labs up. Run this command
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
And lab1 passes. Not sure if it completely solves the problem though
I modified the below to make it work on my RTX3090 + Ubuntu 20:
- remove both cuda and cudnn versions in
environment.yml
- after setting the labs up via make conda-update, run
conda install -c anaconda cudatoolkit
- finally run
conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch
RTX3070 + Ubuntu 18.04
-
(if activated) conda deactivate
-
conda env remove -n fsdl-text-recognizer-2021
-
remove both cuda and cudnn versions in environment.yml as tranhoangkhuongvn mentioned
- enviornment.yml will look like this
name: fsdl-text-recognizer-2021 channels: - defaults dependencies: - python=3.6 # Google Colab is still on Python 3.6 - pip - pip: - pip-tools
-
make conda-update
-
conda activate fsdl-text-recognizer-2021
-
make pip-tools
-
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge
- from https://pytorch.org/get-started/locally/