BorderDet icon indicating copy to clipboard operation
BorderDet copied to clipboard

How to solve the error "cuda runtime error (98) : invalid device function" when run the borderDet?

Open fengqian-wei opened this issue 4 years ago • 4 comments

I first install cvpods and success to train retinaNet. However, I face the error in border_align when I train borderDet. Now I don't know how to fix it.

发生异常: RuntimeError cuda runtime error (98) : invalid device function at /home//weizhiwei/work/cvpods/cvpods/layers/csrc/border_align/border_align_kernel.cu:202 File "/home/weizhiwei/work/cvpods/cvpods/layers/border_align.py", line 15, in forward output = _C.border_align_forward(input, boxes, wh, pool_size) File "/home/weizhiwei/work/cvpods/cvpods/layers/border_align.py", line 42, in forward output = border_align(feature, boxes, wh, self.pool_size) File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/borderdet.py", line 809, in forward ltrb_conv = self.border_align(feature, boxes) File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/borderdet.py", line 734, in forward border_cls_conv = self.border_cls_subnet(cls_subnet, align_boxes, wh) File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/borderdet.py", line 147, in forward ) = self.head(features, shifts) File "/home/weizhiwei/work/cvpods/cvpods/engine/base_runner.py", line 185, in run_step loss_dict = self.model(data) File "/home/weizhiwei/work/cvpods/cvpods/engine/base_runner.py", line 84, in train self.run_step() File "/home/weizhiwei/work/cvpods/cvpods/engine/runner.py", line 271, in train super().train(self.start_iter, self.start_epoch, self.max_iter) File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/train_net.py", line 96, in main runner.train() File "/home/weizhiwei/work/cvpods/cvpods/engine/launch.py", line 56, in launch main_func(*args) File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/train_net.py", line 110, in args=(args,),

fengqian-wei avatar May 31 '21 09:05 fengqian-wei

error: when train in single GPU, Default process group is not initialized solution:https://blog.csdn.net/m0_37568067/article/details/109785209

fengqian-wei avatar Jun 01 '21 02:06 fengqian-wei

https://xiulian.blog.csdn.net/article/details/111035882 It works well on 2080Ti and V100. Maybe you should follow the methods as follows: https://blog.csdn.net/m0_38007695/article/details/107065617

Maycbj avatar Jun 02 '21 05:06 Maycbj

error: when train in single GPU, Default process group is not initialized solution:https://blog.csdn.net/m0_37568067/article/details/109785209

yeah, it is a well known bug when training on single GPU. We will fix the Default process group initialized.

Maycbj avatar Jun 02 '21 05:06 Maycbj

have you solved this problem? I can't solve the problem too.

wwwyyk avatar Jun 25 '21 00:06 wwwyyk