hand_object_detector RuntimeError: CUDA error: device-side assert triggered (can't train the model if batch is not 1)

it seems that the batch_size can only be 1, when I set the batch_size = 4 or 8 during training, the error occurs:

Traceback (most recent call last): File "trainval_net.py", line 321, in rois_label, loss_list = fasterRCNN(im_data, im_info, gt_boxes, num_boxes, box_info) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/content/Hand-Object-Interaction-detection/lib/model/faster_rcnn/faster_rcnn.py", line 62, in forward roi_data = self.RCNN_proposal_target(rois, gt_boxes, num_boxes, box_info) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/content/Hand-Object-Interaction-detection/lib/model/rpn/proposal_target_layer_cascade.py", line 52, in forward rois_per_image, self._num_classes, box_info) File "/content/Hand-Object-Interaction-detection/lib/model/rpn/proposal_target_layer_cascade.py", line 146, in _sample_rois_pytorch fg_inds = torch.nonzero(max_overlaps[i] >= cfg.TRAIN.FG_THRESH).view(-1) RuntimeError: CUDA error: device-side assert triggered

May 13 '21 16:05 forever208

hey @ddshan, have u ever trained the network with batch_size = 4 or others?

May 13 '21 17:05 forever208

I turned off the cuda, then the practical error is:

Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth Traceback (most recent call last): File "trainval_net.py", line 321, in rois_label, loss_list = fasterRCNN(im_data, im_info, gt_boxes, num_boxes, box_info) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/content/Hand-Object-Interaction-detection/lib/model/faster_rcnn/faster_rcnn.py", line 62, in forward roi_data = self.RCNN_proposal_target(rois, gt_boxes, num_boxes, box_info) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/content/Hand-Object-Interaction-detection/lib/model/rpn/proposal_target_layer_cascade.py", line 52, in forward rois_per_image, self._num_classes, box_info) File "/content/Hand-Object-Interaction-detection/lib/model/rpn/proposal_target_layer_cascade.py", line 136, in _sample_rois_pytorch list_box.append(box_info[i][(offset[i,:].view(-1),)]) IndexError: index 20 is out of bounds for dimension 0 with size 20

May 13 '21 19:05 forever208

I am pretty sure the problem is in the proposal_target_layer_cascade.py, around 170 line

labels = gt_boxes[:, :, 4].contiguous().view(-1)[(offset.view(-1),)].view(batch_size, -1)
        list_box = []
        for i in range(batch_size):
            """error when batch > 1, IndexError: index 20 is out of bounds for dimension 0 with size 20"""
            list_box.append(box_info[i][(offset[i, :].view(-1),)])
        boxes_info = torch.stack(list_box)

May 16 '21 21:05 forever208

Hi,

We only trained with batch size = 1 due to constraints of our modification on the codebase we followed. Sorry for the inconvenience. Will let you know if we have an improved version.

Nov 12 '21 18:11 ddshan

hand_object_detector hand_object_detector copied to clipboard

RuntimeError: CUDA error: device-side assert triggered (can't train the model if batch is not 1)

hand_object_detector
hand_object_detector copied to clipboard