Voxel-R-CNN icon indicating copy to clipboard operation
Voxel-R-CNN copied to clipboard

cuda error

Open mc171819 opened this issue 4 years ago • 3 comments

hi, when i run your train.py, it comes out an error: 2021-08-13 15:43:07,291 INFO Start training voxel_rcnn/voxel_rcnn_car(default) epochs: 0%| | 0/80 [00:00<?, ?it/sError!: 0%| | 0/3741 [00:00<?, ?it/s] Error! epochs: 0%| | 0/80 [01:15<?, ?it/s] Traceback (most recent call last): File "train.py", line 198, in main() File "train.py", line 170, in main merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch File "/data/mc_data/Voxel-R-CNN-main/tools/train_utils/train_utils.py", line 93, in train_model dataloader_iter=dataloader_iter File "/data/mc_data/Voxel-R-CNN-main/tools/train_utils/train_utils.py", line 38, in train_one_epoch loss, tb_dict, disp_dict = model_func(model, batch) File "/home/mc/Project/OpenPCDet/pcdet/models/init.py", line 42, in model_func ret_dict, tb_dict, disp_dict = model(batch_dict) File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 447, in forward output = self.module(*inputs[0], **kwargs[0]) File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/mc/Project/OpenPCDet/pcdet/models/detectors/voxel_rcnn.py", line 11, in forward batch_dict = cur_module(batch_dict) File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/mc/Project/OpenPCDet/pcdet/models/roi_heads/voxelrcnn_head.py", line 227, in forward targets_dict = self.assign_targets(batch_dict) File "/home/mc/Project/OpenPCDet/pcdet/models/roi_heads/roi_head_template.py", line 104, in assign_targets targets_dict = self.proposal_target_layer.forward(batch_dict) File "/home/mc/Project/OpenPCDet/pcdet/models/roi_heads/target_assigner/proposal_target_layer.py", line 33, in forward batch_dict=batch_dict File "/home/mc/Project/OpenPCDet/pcdet/models/roi_heads/target_assigner/proposal_target_layer.py", line 101, in sample_rois_for_rcnn gt_boxes=cur_gt[:, 0:7], gt_labels=cur_gt[:, -1].long() File "/home/mc/Project/OpenPCDet/pcdet/models/roi_heads/target_assigner/proposal_target_layer.py", line 223, in get_max_iou_with_same_class iou3d = iou3d_nms_utils.boxes_iou3d_gpu(cur_roi, cur_gt) # (M, N) File "/home/mc/Project/OpenPCDet/pcdet/ops/iou3d_nms/iou3d_nms_utils.py", line 71, in boxes_iou3d_gpu overlaps_h = torch.clamp(min_of_max - max_of_min, min=0) RuntimeError: CUDA error: invalid device function Traceback (most recent call last):
File "/opt/anaconda3/envs/objfuse/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/anaconda3/envs/objfuse/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in main() File "/opt/anaconda3/envs/objfuse/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main cmd=cmd) subprocess.CalledProcessError: Command '['/opt/anaconda3/envs/objfuse/bin/python', '-u', 'train.py', '--local_rank=0', '--launcher', 'pytorch', '--cfg_file', 'cfgs/voxel_rcnn/voxel_rcnn_car.yaml', '--epochs', '80', '--workers', '8']' died with <Signals.SIGSEGV: 11>.

i use pytorch1.4,cudatoolkit=10.1,gpu is 2080ti. canyou give me some advice?

mc171819 avatar Aug 13 '21 07:08 mc171819

Hi @mc171819 ,

This error hasn't occurred to me. I suggest you run the code with the docker image I provide.

djiajunustc avatar Aug 13 '21 11:08 djiajunustc

Hi @mc171819 ,

This error hasn't occurred to me. I suggest you run the code with the docker image I provide.

hi, i tried using the docker image you provide, but it still doesn't work. i wonder if i mistake something. can you show me the concret step to use the docker image?

mc171819 avatar Aug 14 '21 07:08 mc171819

hi, i wonder how the train_dataset is obtained. whatever change i make in build_dataloader in pcdet/dataset/init, or kitti/processor/data_processor it doesn't work, even if i make a bug. can you tell me why?

At 2021-08-13 19:11:07, "djiajunustc" @.***> wrote:

Hi @mc171819 ,

This error hasn't occurred to me. I suggest you run the code with the docker image I provide.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

mc171819 avatar Sep 17 '21 07:09 mc171819