mmcv icon indicating copy to clipboard operation
mmcv copied to clipboard

Training with GPU -- RuntimeError: roi_align_forward_impl:

Open soumyadbanik opened this issue 1 year ago • 1 comments

Prerequisite

  • [X] I have searched Issues and Discussions but cannot get the expected help.
  • [X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmcv).

Environment

I'm training the AVA dataset for spatio-temporal activity detection. But it's not taking any gpu while I've 2 gpuspresent in my machine. However it's supposed to take gpu by default but which is not happening in the latest mmcv version. If I enable gpu with the CUDA_VISIBLE_DEVICES=0,1 environment variable, I'm getting this error.

File "/home/soumyadeep/mmaction_custom/mmaction2_v1.0/mmcv/mmcv/ops/roi_align.py", line 90, in forward
    ext_module.roi_align_forward(
RuntimeError: roi_align_forward_impl: implementation for device cuda:0 not found.

image

Reproduces the problem - code sample

/mmcv/mmcv/ops/roi_align.py

Reproduces the problem - command or script

CUDA_VISIBLE_DEVICES=0,1 bash tools/dist_train.sh /home/soumyadeep/mmaction_custom/mmaction2_v1.0/configs/detection/slowfast/slowfast_kinetics400-pretrained-r50_8xb16-4x16x1-20e_ava21-rgb.py 2

Reproduces the problem - error message

File "/home/soumyadeep/miniconda3/envs/openmmlab2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/soumyadeep/mmaction_custom/mmaction2_v1.0/mmaction/models/roi_heads/roi_extractors/single_straight3d.py", line 122, in forward
    roi_feat = self.roi_layer(frame_feat, rois)
  File "/home/soumyadeep/miniconda3/envs/openmmlab2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/soumyadeep/mmaction_custom/mmaction2_v1.0/mmcv/mmcv/ops/roi_align.py", line 210, in forward
    return roi_align(input, rois, self.output_size, self.spatial_scale,
  File "/home/soumyadeep/miniconda3/envs/openmmlab2/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/soumyadeep/mmaction_custom/mmaction2_v1.0/mmcv/mmcv/ops/roi_align.py", line 90, in forward
    ext_module.roi_align_forward(
RuntimeError: roi_align_forward_impl: implementation for device cuda:0 not found.

Additional information

No response

soumyadbanik avatar Jun 28 '23 08:06 soumyadbanik

Hi @soumyadbanik , it maybe mmcv was not installed with cuda op support.

zhouzaida avatar Jul 03 '23 02:07 zhouzaida