ReDet icon indicating copy to clipboard operation
ReDet copied to clipboard

CUDA error: an illegal memory access was encountered in roi_align backward funcation

Open iamstupidd opened this issue 2 years ago • 3 comments

Thanks for ur work, it's pretty pretty helpful. conda environment: mmcv 0.2.16 cuda 11.1 torch 1.8.0 RTX 3080 Dataset: Fair1M for obbox detect when in config.py use roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2), i can only use two gpu, when use four, then print an error, " THCudaCheck FAIL file=ReDet/mmdet/ops/roi_align/src/roi_ane=292 error=700 : an illegal memory access was encountered ". But it's ok for use roi_layer=dict(type='RoIPool', out_size=7) to fully use 4 gpu. it's so weird. Therefore i am sure there is a bug left in roi_align_kernel.cu, i am debugging it out now.
Any idea? thx

iamstupidd avatar Aug 30 '21 05:08 iamstupidd

what's more, there is a problem in validate map evalution, it is always zero, isn't it? do u have same problem? if yes, i had fix it by change some files in mmdet/core/evaluation/

iamstupidd avatar Aug 30 '21 06:08 iamstupidd

Line 292: https://github.com/csuhan/ReDet/blob/0b9addf3c2734659fd6ffc7824f2e659fde4419c/mmdet/ops/riroi_align/src/riroi_align_kernel.cu#L292 Please check the annotation first and make sure all bboxes with valid values (especially the field angle).

csuhan avatar Sep 01 '21 09:09 csuhan

I have not meet the bug yet. Can you share your modification?

csuhan avatar Sep 01 '21 09:09 csuhan