OpenPCDet Training/Testing pv-rcnn using custom data

Training/Testing pv-rcnn using custom data

Open binshaea opened this issue 6 months ago • 0 comments

I am trying to train pv-rcnn model using a custom dataset. I prepared my data as recommended in the custom_data_tutorial page, then I generated data info

then I used the following commands: for training I used the following command: python train.py --cfg_file cfgs/custom_models/pv_rcnn.yaml --batch_size 1 --workers 1 --epochs 10 and I am getting a memory related error 2024-08-06 00:59:44,318 INFO **********************Start training custom_models/pv_rcnn(default)********************** epochs: 0%| | 0/10 [00:00<?, ?it/sBEV point_features shape: torch.Size([2048, 256]) | 0/35 [00:00<?, ?it/s] Raw points pooled_features shape: torch.Size([2048, 32]) x_conv1 pooled_features shape: torch.Size([2048, 32]) x_conv2 pooled_features shape: torch.Size([2048, 64]) x_conv3 pooled_features shape: torch.Size([2048, 128]) x_conv4 pooled_features shape: torch.Size([2048, 128]) point_features shape before reshaping: torch.Size([2048, 640]) Expected input features for vsa_point_feature_fusion: 640 epochs: 0%| | 0/10 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/elham/OpenPCDet/tools/train.py", line 233, in <module> main() File "/home/elham/OpenPCDet/tools/train.py", line 178, in main train_model( File "/home/elham/OpenPCDet/tools/train_utils/train_utils.py", line 180, in train_model accumulated_iter = train_one_epoch( File "/home/elham/OpenPCDet/tools/train_utils/train_utils.py", line 56, in train_one_epoch loss, tb_dict, disp_dict = model_func(model, batch) File "/home/elham/OpenPCDet/tools/../pcdet/models/__init__.py", line 44, in model_func ret_dict, tb_dict, disp_dict = model(batch_dict) File "/home/elham/anaconda3/envs/openpcdet_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/elham/OpenPCDet/tools/../pcdet/models/detectors/pv_rcnn.py", line 11, in forward batch_dict = cur_module(batch_dict) File "/home/elham/anaconda3/envs/openpcdet_env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/elham/OpenPCDet/tools/../pcdet/models/dense_heads/anchor_head_single.py", line 61, in forward targets_dict = self.assign_targets( File "/home/elham/OpenPCDet/tools/../pcdet/models/dense_heads/anchor_head_template.py", line 96, in assign_targets targets_dict = self.target_assigner.assign_targets( File "/home/elham/OpenPCDet/tools/../pcdet/models/dense_heads/target_assigner/axis_aligned_target_assigner.py", line 83, in assign_targets single_target = self.assign_targets_single( File "/home/elham/OpenPCDet/tools/../pcdet/models/dense_heads/target_assigner/axis_aligned_target_assigner.py", line 142, in assign_targets_single if self.match_height else box_utils.boxes3d_nearest_bev_iou(anchors[:, 0:7], gt_boxes[:, 0:7]) File "/home/elham/OpenPCDet/tools/../pcdet/utils/box_utils.py", line 340, in boxes3d_nearest_bev_iou return boxes_iou_normal(boxes_bev_a, boxes_bev_b) File "/home/elham/OpenPCDet/tools/../pcdet/utils/box_utils.py", line 302, in boxes_iou_normal x_max = torch.min(boxes_a[:, 2, None], boxes_b[None, :, 2]) RuntimeError: CUDA out of memory. Tried to allocate 3.16 GiB (GPU 0; 7.66 GiB total capacity; 4.36 GiB already allocated; 1.36 GiB free; 4.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

for testing I used the following command: python test.py --cfg_file cfgs/custom_models/pv_rcnn.yaml --ckpt pv_rcnn_8369.pth --batch_size 1 --workers 1 and I am getting this error after evaluation and lebling finishes eval: 96%|████████████████████████████████████████████████████████████████████████████████████████████████▉ | 24/25 [00:09<00:00, 2.96it/s, recall_0.3=(0, 0) / 433825]BEV point_features shape: torch.Size([2048, 256]) Raw points pooled_features shape: torch.Size([2048, 32]) x_conv1 pooled_features shape: torch.Size([2048, 32]) x_conv2 pooled_features shape: torch.Size([2048, 64]) x_conv3 pooled_features shape: torch.Size([2048, 128]) x_conv4 pooled_features shape: torch.Size([2048, 128]) point_features shape before reshaping: torch.Size([2048, 640]) Expected input features for vsa_point_feature_fusion: 640 eval: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:09<00:00, 2.65it/s, recall_0.3=(0, 0) / 445618] 2024-08-06 00:47:35,210 INFO *************** Performance of EPOCH 8369 ***************** 2024-08-06 00:47:35,211 INFO Generate label finished(sec_per_example: 0.3773 second). 2024-08-06 00:47:35,211 INFO recall_roi_0.3: 0.000000 2024-08-06 00:47:35,211 INFO recall_rcnn_0.3: 0.000000 2024-08-06 00:47:35,211 INFO recall_roi_0.5: 0.000000 2024-08-06 00:47:35,211 INFO recall_rcnn_0.5: 0.000000 2024-08-06 00:47:35,211 INFO recall_roi_0.7: 0.000000 2024-08-06 00:47:35,211 INFO recall_rcnn_0.7: 0.000000 2024-08-06 00:47:35,211 INFO Average predicted number of objects(25 samples): 17.960 Traceback (most recent call last): File "/home/elham/OpenPCDet/tools/test.py", line 210, in <module> main() File "/home/elham/OpenPCDet/tools/test.py", line 206, in main eval_single_ckpt(model, test_loader, args, eval_output_dir, logger, epoch_id, dist_test=dist_test) File "/home/elham/OpenPCDet/tools/test.py", line 65, in eval_single_ckpt eval_utils.eval_one_epoch( File "/home/elham/OpenPCDet/tools/eval_utils/eval_utils.py", line 125, in eval_one_epoch result_str, result_dict = dataset.evaluation( File "/home/elham/OpenPCDet/tools/../pcdet/datasets/custom/custom_dataset.py", line 137, in evaluation ap_result_str, ap_dict = kitti_eval(eval_det_annos, eval_gt_annos, self.map_class_to_kitti) File "/home/elham/OpenPCDet/tools/../pcdet/datasets/custom/custom_dataset.py", line 122, in kitti_eval kitti_utils.transform_annotations_to_kitti_format(eval_det_annos, map_name_to_kitti=map_name_to_kitti) File "/home/elham/OpenPCDet/tools/../pcdet/datasets/kitti/kitti_utils.py", line 21, in transform_annotations_to_kitti_format anno['name'][k] = map_name_to_kitti[anno['name'][k]] KeyError: 'Car' Anyone faced similar issue or have an idea what could be the cause of it? I also tried to run the demo.py with the custom data and the pretrianed pv_rcnn_8369.pth and only after I tried to adjust some parameters in the model config (located in the cfgs/custom_models/pv_rcnn.yaml) such as the SCORE_THRESH under the POST_PROCESSING configurations ( I changed from 0.1 to 0.005 to lower the threshold) then I was able to see some bboxes appears with the visualized samples but they are incorrect

I am trying to understand if there is a way to improve the prediction results using the pretrained weights and hopfully perform training and validation

Aug 05 '24 22:08 binshaea

OpenPCDet OpenPCDet copied to clipboard

Training/Testing pv-rcnn using custom data

OpenPCDet
OpenPCDet copied to clipboard