AerialDetection icon indicating copy to clipboard operation
AerialDetection copied to clipboard

CUDA out of memory

Open NicholasIrving opened this issue 5 years ago • 2 comments

When i was training, the memory has ran out.

My env:

sys.platform: linux Python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] CUDA available: True CUDA_HOME: /usr/local/cuda-9.0 NVCC: Cuda compilation tools, release 9.0, V9.0.176 GPU 0,1: TITAN V GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 PyTorch: 1.3.1

It seems there are so many gt boxes in a single photo, may i ask how to solve?

Traceback (most recent call last):
  File "tools/train.py", line 130, in <module>
    main()
  File "tools/train.py", line 126, in main
    timestamp=timestamp)
  File "/disk1/NiCholas/mmdetection/mmdet/apis/train.py", line 111, in train_detector
    timestamp=timestamp)
  File "/disk1/NiCholas/mmdetection/mmdet/apis/train.py", line 297, in _non_dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/disk1/NiCholas/anaconda3/envs/nick_rs/lib/python3.7/site-packages/mmcv/runner/runner.py", line 364, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/disk1/NiCholas/anaconda3/envs/nick_rs/lib/python3.7/site-packages/mmcv/runner/runner.py", line 268, in train
    self.model, data_batch, train_mode=True, **kwargs)
  File "/disk1/NiCholas/mmdetection/mmdet/apis/train.py", line 78, in batch_processor
    losses = model(**data)
  File "/disk1/NiCholas/anaconda3/envs/nick_rs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/disk1/NiCholas/anaconda3/envs/nick_rs/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/disk1/NiCholas/anaconda3/envs/nick_rs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/disk1/NiCholas/mmdetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/disk1/NiCholas/mmdetection/mmdet/models/detectors/base.py", line 137, in forward
    return self.forward_train(img, img_meta, **kwargs)
  File "/disk1/NiCholas/mmdetection/mmdet/models/detectors/two_stage.py", line 176, in forward_train
    *rpn_loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
  File "/disk1/NiCholas/mmdetection/mmdet/models/anchor_heads/rpn_head.py", line 51, in loss
    gt_bboxes_ignore=gt_bboxes_ignore)
  File "/disk1/NiCholas/mmdetection/mmdet/core/fp16/decorators.py", line 127, in new_func
    return old_func(*args, **kwargs)
  File "/disk1/NiCholas/mmdetection/mmdet/models/anchor_heads/anchor_head.py", line 189, in loss
    sampling=self.sampling)
  File "/disk1/NiCholas/mmdetection/mmdet/core/anchor/anchor_target.py", line 63, in anchor_target
    unmap_outputs=unmap_outputs)
  File "/disk1/NiCholas/mmdetection/mmdet/core/utils/misc.py", line 24, in multi_apply
    return tuple(map(list, zip(*map_results)))
  File "/disk1/NiCholas/mmdetection/mmdet/core/anchor/anchor_target.py", line 116, in anchor_target_single
    anchors, gt_bboxes, gt_bboxes_ignore, None, cfg)
  File "/disk1/NiCholas/mmdetection/mmdet/core/bbox/assign_sampling.py", line 30, in assign_and_sample
    gt_labels)
  File "/disk1/NiCholas/mmdetection/mmdet/core/bbox/assigners/max_iou_assigner.py", line 99, in assign
    overlaps = bbox_overlaps(gt_bboxes, bboxes)
  File "/disk1/NiCholas/mmdetection/mmdet/core/bbox/geometry.py", line 74, in bbox_overlaps
    rb = torch.min(bboxes1[:, None, 2:], bboxes2[:, 2:])  # [rows, cols, 2]
RuntimeError: CUDA out of memory. Tried to allocate 5.78 GiB (GPU 0; 11.75 GiB total capacity; 7.11 GiB already allocated; 3.41 GiB free; 167.14 MiB cached)

NicholasIrving avatar Mar 16 '20 16:03 NicholasIrving

Which model config you used? You could set the "MaxIoUAssignerCy". For example,

https://github.com/dingjiansw101/AerialDetection/blob/a717a85953eab240435cbcbb39396481cd831068/configs/DOTA/faster_rcnn_obb_r50_fpn_1x_dota.py#L53

https://github.com/dingjiansw101/AerialDetection/blob/a717a85953eab240435cbcbb39396481cd831068/configs/DOTA/faster_rcnn_obb_r50_fpn_1x_dota.py#L76

dingjiansw101 avatar Mar 18 '20 03:03 dingjiansw101

Which model config you used? You could set the "MaxIoUAssignerCy". For example,

https://github.com/dingjiansw101/AerialDetection/blob/a717a85953eab240435cbcbb39396481cd831068/configs/DOTA/faster_rcnn_obb_r50_fpn_1x_dota.py#L53

https://github.com/dingjiansw101/AerialDetection/blob/a717a85953eab240435cbcbb39396481cd831068/configs/DOTA/faster_rcnn_obb_r50_fpn_1x_dota.py#L76

But compute IoU very slowly by Cython.

wangjue-wzq avatar Apr 22 '20 01:04 wangjue-wzq