How to solve the error "cuda runtime error (98) : invalid device function" when run the borderDet?
I first install cvpods and success to train retinaNet. However, I face the error in border_align when I train borderDet. Now I don't know how to fix it.
发生异常: RuntimeError
cuda runtime error (98) : invalid device function at /home//weizhiwei/work/cvpods/cvpods/layers/csrc/border_align/border_align_kernel.cu:202
File "/home/weizhiwei/work/cvpods/cvpods/layers/border_align.py", line 15, in forward
output = _C.border_align_forward(input, boxes, wh, pool_size)
File "/home/weizhiwei/work/cvpods/cvpods/layers/border_align.py", line 42, in forward
output = border_align(feature, boxes, wh, self.pool_size)
File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/borderdet.py", line 809, in forward
ltrb_conv = self.border_align(feature, boxes)
File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/borderdet.py", line 734, in forward
border_cls_conv = self.border_cls_subnet(cls_subnet, align_boxes, wh)
File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/borderdet.py", line 147, in forward
) = self.head(features, shifts)
File "/home/weizhiwei/work/cvpods/cvpods/engine/base_runner.py", line 185, in run_step
loss_dict = self.model(data)
File "/home/weizhiwei/work/cvpods/cvpods/engine/base_runner.py", line 84, in train
self.run_step()
File "/home/weizhiwei/work/cvpods/cvpods/engine/runner.py", line 271, in train
super().train(self.start_iter, self.start_epoch, self.max_iter)
File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/train_net.py", line 96, in main
runner.train()
File "/home/weizhiwei/work/cvpods/cvpods/engine/launch.py", line 56, in launch
main_func(*args)
File "/home/weizhiwei/work/cvpods/BorderDet/playground/detection/coco/borderdet/borderdet.res50.fpn.coco.800size.1x/train_net.py", line 110, in
error: when train in single GPU, Default process group is not initialized solution:https://blog.csdn.net/m0_37568067/article/details/109785209
https://xiulian.blog.csdn.net/article/details/111035882 It works well on 2080Ti and V100. Maybe you should follow the methods as follows: https://blog.csdn.net/m0_38007695/article/details/107065617
error: when train in single GPU, Default process group is not initialized solution:https://blog.csdn.net/m0_37568067/article/details/109785209
yeah, it is a well known bug when training on single GPU. We will fix the Default process group initialized.
have you solved this problem? I can't solve the problem too.