DBNet.pytorch icon indicating copy to clipboard operation
DBNet.pytorch copied to clipboard

求解:运行时报CUDA error和cuDNN error是什么原因。也试过调整cuda版本了,但是没用

Open Zuikke opened this issue 3 years ago • 1 comments

CUDA error: no kernel image is available for execution on the device Error occurs, No graph saved 2022-06-25 16:12:32,431 DBNet.pytorch ERROR: Traceback (most recent call last): File "/sun4/DBNet.pytorch-master/base/base_trainer.py", line 78, in init self.writer.add_graph(self.model, dummy_input) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/utils/tensorboard/writer.py", line 707, in add_graph self._get_file_writer().add_graph(graph(model, input_to_model, verbose)) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 291, in graph raise e File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 285, in graph trace = torch.jit.trace(model, args) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/jit/init.py", line 882, in trace check_tolerance, _force_outplace, _module_class) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/jit/init.py", line 1034, in trace_module module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, _force_outplace) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in call result = self._slow_forward(*input, **kwargs) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward result = self.forward(*input, **kwargs) File "/sun4/DBNet.pytorch-master/models/model.py", line 31, in forward backbone_out = self.backbone(x) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in call result = self._slow_forward(*input, **kwargs) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 516, in _slow_forward result = self.forward(*input, **kwargs) File "/sun4/DBNet.pytorch-master/models/backbone/resnet.py", line 180, in forward x = self.relu(x) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 530, in call result = self._slow_forward(*input, **kwargs) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 516, in slow_forward result = self.forward(*input, **kwargs) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 94, in forward return F.relu(input, inplace=self.inplace) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/functional.py", line 912, in relu result = torch.relu(input) RuntimeError: CUDA error: no kernel image is available for execution on the device

2022-06-25 16:12:32,431 DBNet.pytorch WARNING: add graph to tensorboard failed 2022-06-25 16:12:32,434 DBNet.pytorch INFO: train dataset has 1000 samples,32 in dataloader, validate dataset has 500 samples,500 in dataloader Traceback (most recent call last): File "tools/train.py", line 78, in main(config) File "tools/train.py", line 59, in main trainer.train() File "/sun4/DBNet.pytorch-master/base/base_trainer.py", line 104, in train self.epoch_result = self._train_epoch(epoch) File "/sun4/DBNet.pytorch-master/trainer/trainer.py", line 59, in _train_epoch preds = self.model(batch['img']) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/sun4/DBNet.pytorch-master/models/model.py", line 31, in forward backbone_out = self.backbone(x) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/sun4/DBNet.pytorch-master/models/backbone/resnet.py", line 179, in forward x = self.bn1(x) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 107, in forward exponential_average_factor, self.eps) File "/root/anaconda3/envs/dbnet/lib/python3.6/site-packages/torch/nn/functional.py", line 1670, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Zuikke avatar Jun 26 '22 00:06 Zuikke

请问这个问题解决了吗?

Echhoo avatar May 10 '23 03:05 Echhoo