YOLOv3v4-ModelCompression-MultidatasetTraining-Multibackbone
YOLOv3v4-ModelCompression-MultidatasetTraining-Multibackbone copied to clipboard
多GPU训练
你好, 我在使用多GPU训练的时候, 每次都会遇到这个问题
Namespace(BN_Fold=False, FPGA=False, KDstr=-1, a_bit=8, adam=False, batch_size=16, bucket='', cache_images=False, cfg='./cfg/yolov4/yolov4.cfg', data='data/coco2017.data', device='0,1,2,4', ema=False, epochs=300, evolve=False, img_size=[320, 640], multi_scale=False, name='', nosave=False, notest=False, prune=0, pt=False, quantized=0, rect=False, resume=False, s=0.0001, single_cls=False, sr=True, t_cfg='', t_weights='', w_bit=8, weights='weights/yolo4_coco/qianyi_weight/best.pt')
Using CUDA Apex device0 _CudaDeviceProperties(name='Tesla T4', total_memory=15109MB)
device1 _CudaDeviceProperties(name='Tesla T4', total_memory=15109MB)
device2 _CudaDeviceProperties(name='Tesla T4', total_memory=15109MB)
device3 _CudaDeviceProperties(name='Tesla T4', total_memory=15109MB)
Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/
Model Summary: 327 layers, 6.43631e+07 parameters, 6.43631e+07 gradients
Optimizer groups: 110 .bias, 110 Conv2d.weight, 107 other
muti-gpus sparse
normal sparse training
Image sizes 320 - 640 train, 640 test
Using 8 dataloader workers
Starting training for 300 epochs...
Epoch gpu_mem GIoU obj cls total targets img_size
0%| | 0/4381 [00:00<?, ?it/s]
0%| | 0/4381 [00:01<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 987, in <module>
train(hyp) # train normally
File "train.py", line 330, in train
pred, feature_s = model(imgs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/distributed.py", line 580, in forward
output = self.gather(outputs, self.output_device)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/distributed.py", line 607, in gather
return gather(outputs, output_device, dim=self.dim)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
res = gather_map(outputs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/_functions.py", line 71, in forward
return comm.gather(inputs, ctx.dim, ctx.target_device)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/comm.py", line 230, in gather
return torch._C._gather(tensors, dim, destination)
RuntimeError: CUDA out of memory. Tried to allocate 400.00 MiB (GPU 0; 14.76 GiB total capacity; 13.07 GiB already allocated; 5.75 MiB free; 13.43 GiB reserved in total by PyTorch)
我使用的训练命令是
python train.py --data data/coco2017.data --batch-size 16 --cfg cfg/yolov4/yolov4.cfg --weights weights/yolo4_coco/qianyi_weight/best.pt --cfg cfg/yolov4/yolov4.cfg --device 0,1,2,4 -sr --s 0.0001 --prune 0
我用了4个Tesla T4 GPU, 而且4张卡都是空闲状态,为什么会出现显存不足的现象呢?
你可以试试减小batchsize 或者减小imgsize,yolov4就是比较吃显存
单张T4显卡训练时候,batchsize 的大小设置为10是没问题的。 4张显卡设置成16就不行了,感觉不是batchsize 的问题啊
@chenxyyy did you solve that issue. I am pruning a model, I tried different pruning thresholds and even reduce my batch size to 1 but the same error is comming.
RuntimeError: CUDA out of memory. Tried to allocate 170.00 MiB (GPU 0; 7.79 GiB total capacity; 5.54 GiB already allocated; 43.25 MiB free; 6.11 GiB reserved in total by PyTorch)
@SpursLipu can you suggest something. Also, I tried with pruning threshold between 0.5 to 0.01 but every time it's showing after pruning model mAP is 0.0, which could be the possible problem ??