hand_object_detector
hand_object_detector copied to clipboard
RuntimeError: CUDA error: device-side assert triggered (can't train the model if batch is not 1)
it seems that the batch_size can only be 1, when I set the batch_size = 4 or 8 during training, the error occurs:
Traceback (most recent call last):
File "trainval_net.py", line 321, in
hey @ddshan, have u ever trained the network with batch_size = 4 or others?
I turned off the cuda, then the practical error is:
Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth
Traceback (most recent call last):
File "trainval_net.py", line 321, in
I am pretty sure the problem is in the proposal_target_layer_cascade.py
, around 170 line
labels = gt_boxes[:, :, 4].contiguous().view(-1)[(offset.view(-1),)].view(batch_size, -1)
list_box = []
for i in range(batch_size):
"""error when batch > 1, IndexError: index 20 is out of bounds for dimension 0 with size 20"""
list_box.append(box_info[i][(offset[i, :].view(-1),)])
boxes_info = torch.stack(list_box)
Hi,
We only trained with batch size = 1 due to constraints of our modification on the codebase we followed. Sorry for the inconvenience. Will let you know if we have an improved version.