YOLOv5-Lite icon indicating copy to clipboard operation
YOLOv5-Lite copied to clipboard

batch-size 设置64 报GPU内存不足,调小就报下面的类型错误

Open busyfree opened this issue 2 years ago • 1 comments

GPU是 nvidia 1080 显存 8G

batch-size 设置64 报GPU内存不足,调小就报下面的类型错误

  File "C:\ProgramData\Anaconda3\envs\yolov5lite\lib\site-packages\torch\nn\functional.py", line 2438, in batch_norm
    return torch.batch_norm(
RuntimeError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 8.00 GiB total capacity; 6.65 GiB already allocated; 0 bytes free; 6.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(yolov5lite) PS C:\Users\Administrator\pyhome\YOLOv5-Lite> python train.py --data data/coco128.yaml --cfg models/v5Lite-g.yaml --weights weights/v5lite-g.pt --batch-size 2
github: skipping check (offline)
YOLOv5  v1.4-35-g5b94a9e torch 1.12.0 CUDA:0 (NVIDIA GeForce GTX 1080, 8191.6875MB)

Namespace(adam=False, artifact_alias='latest', batch_size=2, bbox_interval=-1, bucket='', cache_images=False, cfg='models/v5Lite-g.yaml', data='data/coco128.yaml', device='0', entity=None, epochs=300, evolve=False, exist_ok=False, global_rank=-1, hyp='data/hyp.scratch.yaml', image_weights=False, img_size=[640, 640], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, project='runs/train', quad=False, rect=False, resume=False, save_dir='runs\\train\\exp10', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=2, upload_dataset=False, weights='weights/v5lite-g.pt', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
hyperparameters: lr0=0.001, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=0.2, mixup=0.0
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)

                 from  n    params  module                                  arguments
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]
  1                -1  1     20736  models.common.RepVGGBlock               [32, 64, 3, 2]
  2                -1  1     18816  models.common.C3                        [64, 64, 1]
  3                -1  1     82432  models.common.RepVGGBlock               [64, 128, 3, 2]
  4                -1  1    156928  models.common.C3                        [128, 128, 3]
  5                -1  1    328704  models.common.RepVGGBlock               [128, 256, 3, 2]
  6                -1  1    625152  models.common.C3                        [256, 256, 3]
  7                -1  1   1312768  models.common.RepVGGBlock               [256, 512, 3, 2]
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]
  9                -1  1   1182720  models.common.C3                        [512, 512, 1, False]
 10                -1  1     65792  models.common.Conv                      [512, 128, 1, 1]
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 12           [-1, 6]  1         0  models.common.Concat                    [1]
 13                -1  1    189696  models.common.C3                        [384, 128, 3, False]
 14                -1  1     16640  models.common.Conv                      [128, 128, 1, 1]
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 16           [-1, 4]  1         0  models.common.Concat                    [1]
 17                -1  1    173312  models.common.C3                        [256, 128, 3, False]
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]
 19          [-1, 14]  1         0  models.common.Concat                    [1]
 20                -1  1    173312  models.common.C3                        [256, 128, 3, False]
 21                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]
 22          [-1, 10]  1         0  models.common.Concat                    [1]
 23                -1  1    173312  models.common.C3                        [256, 128, 3, False]
 24      [17, 20, 23]  1     98685  models.yolo.Detect                      [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 128, 128]]
C:\ProgramData\Anaconda3\envs\yolov5lite\lib\site-packages\torch\functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\TensorShape.cpp:2895.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 375 layers, 5574845 parameters, 5574845 gradients, 16.3 GFLOPS

Transferred 480/482 items from weights/v5lite-g.pt
Scaled weight_decay = 0.0005
Optimizer groups: 82 .bias, 82 conv.weight, 79 other
train: Scanning '..\datasets\coco128\labels\train2017.cache' images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:00<?, ?it/s]
val: Scanning '..\datasets\coco128\labels\train2017.cache' images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:00<?, ?it/s]
Plotting labels...

autoanchor: Analyzing anchors... anchors/target = 4.26, Best Possible Recall (BPR) = 0.9946
Image sizes 640 train, 640 test
Using 2 dataloader workers
Logging results to runs\train\exp10
Starting training for 300 epochs...

     Epoch   gpu_mem       box       obj       cls     total    labels  img_size
  0%|                                                                                                                                                                                                                                                                                                | 0/64 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 550, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 306, in train
    loss, loss_items = compute_loss(pred, targets.to(device))  # loss scaled by batch_size
  File "C:\Users\Administrator\pyhome\YOLOv5-Lite\utils\loss.py", line 117, in __call__
    tcls, tbox, indices, anchors = self.build_targets(p, targets)  # targets
  File "C:\Users\Administrator\pyhome\YOLOv5-Lite\utils\loss.py", line 211, in build_targets
    indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1)))  # image, anchor, grid indices
RuntimeError: result type Float can't be cast to the desired output type __int64

busyfree avatar Jul 11 '22 07:07 busyfree

抱歉,最近有点忙,现在才回复,你可以试下v5s的bs最高可以设多少不,我之前测过一次,g模型的显存大概会比v5s多10%

ppogg avatar Jul 15 '22 05:07 ppogg