ScaledYOLOv4
ScaledYOLOv4 copied to clipboard
Fail to train with multiple of GPU in DP mode
Here is the wrong detail.
Traceback (most recent call last):
File "/home/xxx/hard_disk/xxx/ScaledYOLOv4/train.py", line 438, in
Process finished with exit code 1
Have you solved it?
Have you solved it?
Sorry I don't. So I run it in DDP mode[Laugh and cry]. It runs well.
Have you solved it?
This is a problem that arises because the elements being computed go into the cpu and gpu respectively. At line 531 of the 'general.py' file, t(target) goes into gpu and anchor goes into cpu, so when you divide anchor by t, an error occurs. This is solved by sending the anchor to the gpu before the calculation takes place. The code is anchor = anchor.to(device='cuda'). Please understand that I am unfamiliar with using github.
Have you solved it?
This is a problem that arises because the elements being computed go into the cpu and gpu respectively. At line 531 of the 'general.py' file, t(target) goes into gpu and anchor goes into cpu, so when you divide anchor by t, an error occurs. This is solved by sending the anchor to the gpu before the calculation takes place. The code is anchor = anchor.to(device='cuda'). Please understand that I am unfamiliar with using github.
I added "anchors = anchors.to(device='cuda')" in 141 line in loss.py file and that been work! (06.09.2021) Now, my code in loss.py (135-149 line) look like for i, jj in enumerate(model.module.yolo_layers if multi_gpu else model.yolo_layers): # get number of grid points and anchor vec for this yolo layer anchors = model.module.module_list[jj].anchor_vec if multi_gpu else model.module_list[jj].anchor_vec gain[2:] = torch.tensor(p[i].shape)[[3, 2, 3, 2]] # xyxy gain
# Match targets to anchors
anchors = anchors.to(device='cuda')
a, t, offsets = [], targets * gain, 0
if nt:
na = anchors.shape[0] # number of anchors
at = torch.arange(na).view(na, 1).repeat(1, nt) # anchor tensor, same as .repeat_interleave(nt)
r = t[None, :, 4:6] / anchors[:, None] # wh ratio
j = torch.max(r, 1. / r).max(2)[0] < model.hyp['anchor_t'] # compare
# j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t'] # iou(3,n) = wh_iou(anchors(3,2), gwh(n,2))
a, t = at[j], t.repeat(na, 1, 1)[j] # filter