ReDet icon indicating copy to clipboard operation
ReDet copied to clipboard

training problem : RuntimeError: cuda runtime error (701) : unrecognized error code at src/riroi_align_kernel.cu:389

Open mmoghadam11 opened this issue 2 years ago • 2 comments

hi @csuhan and tnx very much for answering i want to train ReDet_re50_refpn_3x_hrsc2016 my env is colab tesla t4 and use pytorch 1.1

in training i got this error: File "tools/train.py", line 95, in main() File "tools/train.py", line 91, in main logger=logger) File "/content/ReDet/mmdet/apis/train.py", line 61, in train_detector _non_dist_train(model, dataset, cfg, validate=validate) File "/content/ReDet/mmdet/apis/train.py", line 197, in _non_dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/usr/local/lib/python3.7/dist-packages/mmcv-0.2.13-py3.7-linux-x86_64.egg/mmcv/runner/runner.py", line 358, in run epoch_runner(data_loaders[i], **kwargs) File "/usr/local/lib/python3.7/dist-packages/mmcv-0.2.13-py3.7-linux-x86_64.egg/mmcv/runner/runner.py", line 271, in train self.call_hook('after_train_iter') File "/usr/local/lib/python3.7/dist-packages/mmcv-0.2.13-py3.7-linux-x86_64.egg/mmcv/runner/runner.py", line 229, in call_hook getattr(hook, fn_name)(self) File "/usr/local/lib/python3.7/dist-packages/mmcv-0.2.13-py3.7-linux-x86_64.egg/mmcv/runner/hooks/optimizer.py", line 17, in after_train_iter runner.outputs['loss'].backward() File "/usr/local/lib/python3.7/dist-packages/torch/tensor.py", line 107, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag File "/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py", line 77, in apply return self._forward_cls.backward(self, *args) File "/content/ReDet/mmdet/ops/riroi_align/functions/riroi_align.py", line 58, in backward grad_input) RuntimeError: cuda runtime error (701) : unrecognized error code at src/riroi_align_kernel.cu:389

i red issues/4 but i dont understand what should i do???

mmoghadam11 avatar Aug 29 '21 19:08 mmoghadam11

Maybe the mismatch between your pytorch and cudatoolkit. Please check it.

csuhan avatar Sep 01 '21 09:09 csuhan

Maybe the mismatch between your pytorch and cudatoolkit. Please check it.

it can train roi transform and frCNN_OBB in HRSC2016 and DOTA it happen when i want train just ReDet configs ):

mmoghadam11 avatar Sep 02 '21 10:09 mmoghadam11