Training command --batch-size is changing the workers-size instead of batch-size

Open Idefix0496 opened this issue 2 years ago • 3 comments

YOLOv5 Component



root@94ceb79c1cd3:/usr/src/app# python3 -m torch.distributed.launch --nproc_per_node 2 train.py --batch-size 4 --epochs 3 --img 640 --data coco128.yaml --weights yolov5s.pt /opt/conda/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

warnings.warn( wandb: (1) Create a W&B account wandb: (2) Use an existing W&B account wandb: (3) Don't visualize my results wandb: Enter your choice: (30 second timeout) 3 wandb: You chose 'Don't visualize my results' train: weights=yolov5s.pt, cfg=, data=coco128.yaml, hyp=data/hyps/hyp.scratch-low.yaml, epochs=3, batch_size=4, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, local_rank=0, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest github: skipping check (Docker image), for updates see https://github.com/ultralytics/yolov5 YOLOv5 🚀 c768919 Python-3.8.13 torch-1.12.0+cu113 CUDA:0 (NVIDIA GeForce GTX 1080 Ti, 11264MiB)

Added key: store_based_barrier_key:1 to store for rank: 0 Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED) TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/

             from  n    params  module                                  arguments

0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2] 1 -1 1 18560 models.common.Conv [32, 64, 3, 2] 2 -1 1 18816 models.common.C3 [64, 64, 1] 3 -1 1 73984 models.common.Conv [64, 128, 3, 2] 4 -1 2 115712 models.common.C3 [128, 128, 2] 5 -1 1 295424 models.common.Conv [128, 256, 3, 2] 6 -1 3 625152 models.common.C3 [256, 256, 3] 7 -1 1 1180672 models.common.Conv [256, 512, 3, 2] 8 -1 1 1182720 models.common.C3 [512, 512, 1] 9 -1 1 656896 models.common.SPPF [512, 512, 5] 10 -1 1 131584 models.common.Conv [512, 256, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 models.common.Concat [1] 13 -1 1 361984 models.common.C3 [512, 256, 1, False] 14 -1 1 33024 models.common.Conv [256, 128, 1, 1] 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 16 [-1, 4] 1 0 models.common.Concat [1] 17 -1 1 90880 models.common.C3 [256, 128, 1, False] 18 -1 1 147712 models.common.Conv [128, 128, 3, 2] 19 [-1, 14] 1 0 models.common.Concat [1] 20 -1 1 296448 models.common.C3 [256, 256, 1, False] 21 -1 1 590336 models.common.Conv [256, 256, 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 1 1182720 models.common.C3 [512, 512, 1, False] 24 [17, 20, 23] 1 229245 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]] Model summary: 270 layers, 7235389 parameters, 7235389 gradients, 16.6 GFLOPs

Transferred 349/349 items from yolov5s.pt AMP: checks passed ✅ Scaled weight_decay = 0.0005 optimizer: SGD with parameter groups 57 weight (no decay), 60 weight, 60 bias albumentations: Blur(always_apply=False, p=0.01, blur_limit=(3, 7)), MedianBlur(always_apply=False, p=0.01, blur_limit=(3, 7)), ToGray(always_apply=False, p=0.01), CLAHE(always_apply=False, p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8)) train: Scanning '/usr/src/datasets/coco128/labels/train2017.cache' images and labels... 128 found, 0 missing, 2 empty, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s] val: Scanning '/usr/src/datasets/coco128/labels/train2017.cache' images and labels... 128 found, 0 missing, 2 empty, 0 corrupt: 100%|██████████| 128/128 [00:00<?, ?it/s] Plotting labels to runs/train/exp16/labels.jpg...

AutoAnchor: 4.26 anchors/target, 0.995 Best Possible Recall (BPR). Current anchors are a good fit to dataset ✅ Image sizes 640 train, 640 val Using 4 dataloader workers Logging results to runs/train/exp16 Starting training for 3 epochs...

 Epoch   gpu_mem       box       obj       cls    labels  img_size
   0/2     4.74G   0.04671   0.09375   0.04592        27       640:   6%|▋         | 2/32 [00:05<01:15,  2.51s/it]                                                                                              Reducer buckets have been rebuilt in this iteration.
   0/2     4.75G   0.05006   0.07357   0.03664        16       640: 100%|██████████| 32/32 [00:14<00:00,  2.23it/s]
           Class     Images     Labels          P          R     [email protected] [email protected]:.95: 100%|██████████| 32/32 [00:13<00:00,  2.39it/s]
             all        128        929      0.749      0.617      0.717      0.475

 Epoch   gpu_mem       box       obj       cls    labels  img_size
   1/2     4.75G   0.05107   0.08029   0.03365        22       640: 100%|██████████| 32/32 [00:08<00:00,  3.92it/s]
           Class     Images     Labels          P          R     [email protected] [email protected]:.95: 100%|██████████| 32/32 [00:04<00:00,  6.46it/s]
             all        128        929      0.729      0.581      0.674       0.42

 Epoch   gpu_mem       box       obj       cls    labels  img_size
   2/2     4.75G   0.04708    0.0727   0.03571        20       640: 100%|██████████| 32/32 [00:08<00:00,  3.97it/s]
           Class     Images     Labels          P          R     [email protected] [email protected]:.95: 100%|██████████| 32/32 [00:04<00:00,  6.63it/s]
             all        128        929      0.738      0.637      0.712      0.455

3 epochs completed in 0.016 hours. Optimizer stripped from runs/train/exp16/weights/last.pt, 14.8MB Optimizer stripped from runs/train/exp16/weights/best.pt, 14.8MB

Validating runs/train/exp16/weights/best.pt... Fusing layers... Model summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs Class Images Labels P R [email protected] [email protected]:.95: 100%|██████████| 32/32 [00:05<00:00, 5.65it/s] all 128 929 0.749 0.617 0.717 0.474 person 128 254 0.892 0.686 0.804 0.509 bicycle 128 6 0.565 0.232 0.725 0.34 car 128 46 0.857 0.326 0.537 0.246 motorcycle 128 5 0.59 0.8 0.803 0.641 airplane 128 6 0.996 1 0.995 0.791 bus 128 7 0.564 0.714 0.825 0.713 train 128 3 1 0.549 0.698 0.474 truck 128 12 0.659 0.333 0.411 0.176 boat 128 6 1 0.319 0.449 0.143 traffic light 128 14 0.737 0.203 0.362 0.214 stop sign 128 2 0.864 1 0.995 0.822 bench 128 9 0.702 0.444 0.581 0.237 bird 128 16 0.903 1 0.995 0.643 cat 128 4 0.826 1 0.995 0.747 dog 128 9 0.77 0.745 0.907 0.644 horse 128 2 0.803 1 0.995 0.747 elephant 128 17 0.971 0.882 0.926 0.698 bear 128 1 0.492 1 0.995 0.995 zebra 128 4 0.883 1 0.995 0.906 giraffe 128 9 0.806 0.778 0.904 0.728 backpack 128 6 0.992 0.5 0.708 0.342 umbrella 128 18 0.88 0.816 0.916 0.478 handbag 128 19 0.724 0.158 0.257 0.134 tie 128 7 0.818 0.647 0.702 0.491 suitcase 128 4 0.867 1 0.995 0.51 frisbee 128 5 0.705 0.8 0.798 0.719 skis 128 1 0.748 1 0.995 0.497 snowboard 128 7 0.811 0.571 0.823 0.541 sports ball 128 6 0.649 0.667 0.667 0.314 kite 128 10 0.563 0.7 0.631 0.279 baseball bat 128 4 0.648 0.5 0.538 0.223 baseball glove 128 7 0.761 0.429 0.478 0.311 skateboard 128 5 0.706 0.6 0.659 0.444 tennis racket 128 7 0.79 0.543 0.587 0.319 bottle 128 18 0.672 0.389 0.536 0.294 wine glass 128 16 0.67 0.761 0.801 0.468 cup 128 36 0.859 0.509 0.777 0.506 fork 128 6 1 0.323 0.412 0.296 knife 128 16 0.78 0.688 0.754 0.424 spoon 128 22 0.649 0.409 0.562 0.299 bowl 128 28 0.855 0.631 0.705 0.505 banana 128 1 0.893 1 0.995 0.111 sandwich 128 2 0 0 0.19 0.172 orange 128 4 1 0.44 0.995 0.578 broccoli 128 11 0.352 0.455 0.402 0.306 carrot 128 24 0.685 0.542 0.703 0.492 hot dog 128 2 0.404 1 0.995 0.895 pizza 128 5 1 0.79 0.878 0.66 donut 128 14 0.663 1 0.957 0.814 cake 128 4 0.878 1 0.995 0.785 chair 128 35 0.536 0.6 0.558 0.276 couch 128 6 1 0.63 0.822 0.53 potted plant 128 14 0.752 0.649 0.775 0.521 bed 128 3 1 0 0.665 0.387 dining table 128 13 0.84 0.405 0.599 0.37 toilet 128 2 0.832 1 0.995 0.846 tv 128 2 0.743 1 0.995 0.796 laptop 128 3 1 0 0.747 0.328 mouse 128 2 1 0 0.0439 0.0219 remote 128 8 0.833 0.625 0.607 0.466 cell phone 128 8 0.576 0.25 0.363 0.188 microwave 128 3 0.714 1 0.995 0.699 oven 128 5 0.232 0.4 0.461 0.276 sink 128 6 0.186 0.167 0.328 0.24 refrigerator 128 5 0.674 0.8 0.808 0.513 book 128 29 0.559 0.241 0.323 0.145 clock 128 9 0.772 0.889 0.923 0.655 vase 128 2 0.324 1 0.828 0.745 scissors 128 1 1 0 0.124 0.0124 teddy bear 128 21 0.778 0.5 0.696 0.466 toothbrush 128 5 0.902 0.8 0.938 0.534 Results saved to runs/train/exp16 Destroying process group... root@94ceb79c1cd3:/usr/src/app#


  • YOLOv5 🚀 c768919 Python-3.8.13 torch-1.12.0+cu113 CUDA:0 (NVIDIA GeForce GTX 1080 Ti, 11264MiB)
  • OS: Windows 10 Pro 21H2
  • Nvidia Driver Host System (Windows 10): 516.59 GEFORCE GAME READY-TREIBER
  • Docker Desktop 4.10.0 (82025)
  • Docker Image creation : docker run --ipc=host -it -v E:\Pytorch:/usr/src/datasets -v E:\Pytorch\Results:/usr/src/app/runs --gpus all ultralytics/yolov5:latest
  • GPUS: 2x1080Tis

Minimal Reproducible Example

python3 -m torch.distributed.launch --nproc_per_node 2 train.py --batch-size [n] --epochs 3 --img 640 --data coco128.yaml --weights yolov5s.pt


Hi, as stated in the Title, I've got a problem with the --batch-size Command. Instead of changing the batch size it changes the amount of workers allocated. Making changes to the --batch-size [n] is followed by changes in the Log output: Using [n] dataloader workers. While Training the CPU Load changes also accordingly. I've tried reinstalling everything but without success. Same issue. So in short I have no way to change the batch-size an therefore the amount of VRAM used by the GPUs. I hope that's enough Info so you can help me :-)

Idefix0496 avatar Jul 06 '22 18:07 Idefix0496

github-actions[bot] avatar Jul 06 '22 18:07 github-actions[bot]

@Idefix0496 --batch-size works correctly.

Up to 8 dataloader workers are allowed per RANK. If your batch size is less than 8 per RANK then the worker count is reduced to match, otherwise there will be excess workers.

glenn-jocher avatar Jul 07 '22 11:07 glenn-jocher

