edgeai-yolov5
edgeai-yolov5 copied to clipboard
labels require 56 columns each
❔Question
Hi!
I try to train model for detecting keypoints for one class with 9 keypoints.
I have an errors like:
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/102_jpg.rf.0f0bf4b6ec94f8a7be6527458b7922f3.jpg: labels require 56 columns each
It feels like the model is still trying to find data for 17 points (56 columns each) of human pose while I only have 9 points (9*3 + 5 = 32 columns)
Please, help to solve problem!
Additional context
I try to use: https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose
I made COLAB(with GPU) with code:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
!# Download TexasInstruments | edgeai-yolov5 | YOLO-Pose Multi-person Pose estimation model code
!git clone https://github.com/TexasInstruments/edgeai-yolov5.git -b yolo-pose
%cd edgeai-yolov5
%pip install -r requirements.txt # install
import sys
import torch
print(f"Python version: {sys.version}, {sys.version_info} ")
print(f"Pytorch version: {torch.__version__} ")
import os
key_value = 'OMP_NUM_THREADS'
try:
if os.environ[key_value]:
print(f'The value of {key_value} is {os.environ[key_value]}')
except KeyError:
print(f'{key_value} environment variable is not set.')
os.environ.setdefault(key_value, '8')
Start training:
# Remove train.cache from previous training
# !rm -rf <folder_name>
!rm -rf /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/train.cache
!rm -rf /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train.cache
data_location = '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/data.yaml'
cfg_location = '/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/models/hub/yolov5s6_kpts.yaml'
weights = '/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt'
my_project = '/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg'
hyper_parameters = '/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/data/hyp.scratch.yaml'
!python train.py --data {data_location} --cfg {cfg_location} --weights {weights} --epochs 100 --batch-size 64 --img 640 --kpt-label --project {my_project} --name edgeai-yolov5 --hyp {hyper_parameters}
And I get an error:
github: ⚠️ WARNING: code is out of date by 465 commits. Use 'git pull' to update or 'git clone https://github.com/TexasInstruments/edgeai-yolov5' to download latest.
YOLOv5 � v4.0-76-gae4e0e8 torch 1.13.1+cu116 CUDA:0 (Tesla T4, 15109.875MB)
Namespace(adam=False, artifact_alias='latest', batch_size=64, bbox_interval=-1, bucket='', cache_images=False, cfg='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/models/hub/yolov5s6_kpts.yaml', data='/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/data.yaml', device='', entity=None, epochs=100, evolve=False, exist_ok=False, global_rank=-1, hyp='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/data/hyp.scratch.yaml', image_weights=False, img_size=[640, 640], kpt_label=True, label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='edgeai-yolov5', noautoanchor=False, nosave=False, notest=False, project='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', quad=False, rect=False, resume=False, save_dir='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg/edgeai-yolov56', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=64, upload_dataset=False, weights='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, kpt=0.1, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)
from n params module arguments
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 156928 models.common.C3 [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 625152 models.common.C3 [256, 256, 3]
7 -1 1 885504 models.common.Conv [256, 384, 3, 2]
8 -1 1 665856 models.common.C3 [384, 384, 1]
9 -1 1 1770496 models.common.Conv [384, 512, 3, 2]
[3, 5, 7]
10 -1 1 656896 models.common.SPP [512, 512, [3, 5, 7]]
11 -1 1 1182720 models.common.C3 [512, 512, 1, False]
12 -1 1 197376 models.common.Conv [512, 384, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 8] 1 0 models.common.Concat [1]
15 -1 1 813312 models.common.C3 [768, 384, 1, False]
16 -1 1 98816 models.common.Conv [384, 256, 1, 1]
17 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
18 [-1, 6] 1 0 models.common.Concat [1]
19 -1 1 361984 models.common.C3 [512, 256, 1, False]
20 -1 1 33024 models.common.Conv [256, 128, 1, 1]
21 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
22 [-1, 4] 1 0 models.common.Concat [1]
23 -1 1 90880 models.common.C3 [256, 128, 1, False]
24 -1 1 147712 models.common.Conv [128, 128, 3, 2]
25 [-1, 20] 1 0 models.common.Concat [1]
26 -1 1 296448 models.common.C3 [256, 256, 1, False]
27 -1 1 590336 models.common.Conv [256, 256, 3, 2]
28 [-1, 16] 1 0 models.common.Concat [1]
29 -1 1 715008 models.common.C3 [512, 384, 1, False]
30 -1 1 1327872 models.common.Conv [384, 384, 3, 2]
31 [-1, 12] 1 0 models.common.Concat [1]
32 -1 1 1313792 models.common.C3 [768, 512, 1, False]
33 [23, 26, 29, 32] 1 2681996 models.yolo.Detect [1, [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]], 9, [128, 256, 384, 512]]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Model Summary: 557 layers, 15022412 parameters, 15022412 gradients
Transferred 470/744 items from /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt
Scaled weight_decay = 0.0005
Optimizer groups: 129 .bias, 129 conv.weight, 121 other
Scanning images: 0% 0/102 [00:00<?, ?it/s]
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/100_jpg.rf.4f0ac837f2ad41c10f5c40bd2aceb2d1.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/101_jpg.rf.342c555c0c142ee704a47a7eef5b3e24.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/102_jpg.rf.0f0bf4b6ec94f8a7be6527458b7922f3.jpg: labels require 56 columns each
<...>
train: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train' images and labels... 29 found, 0 missing, 0 empty, 29 corrupted: 28% 29/102 [00:00<00:00, 287.07it/s]
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/128_jpg.rf.ee990fa083f2e1fd001a05e52d24a651.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/12_jpg.rf.90b0d5548d1c6dc5ead449a15eb19b8f.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/130_jpg.rf.fb5e61ff1c164b031e8993ce832e94f7.jpg: labels require 56 columns each
<...>
train: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train' images and labels... 77 found, 0 missing, 0 empty, 77 corrupted: 75% 77/102 [00:00<00:00, 399.34it/s]
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/34_jpg.rf.0b643f03f0ebe6be6fc8bafa7bade034.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/35_jpg.rf.e0b7a971afca6a03921a5c694b9babae.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/36_jpg.rf.f3b0c8f3932a26483534c53f5c1bc5af.jpg: labels require 56 columns each
<...>
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/55_jpg.rf.e3a9328b563b4f7408dabc21c6b31e9d.jpg: labels require 56 columns each
train: WARNING: Ignoring corrupted image and/or label /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/images/train/56_jpg.rf.daeff470200b3da62d2b36c5d4b2bbc3.jpg: labels require 56 columns each
train: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train' images and labels... 102 found, 0 missing, 0 empty, 102 corrupted: 100% 102/102 [00:00<00:00, 393.97it/s]
train: New cache created: /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train.cache
Traceback (most recent call last):
File "train.py", line 550, in <module>
train(hyp, opt, device, tb_writer)
File "train.py", line 189, in train
dataloader, dataset = create_dataloader(train_path, imgsz, batch_size, gs, opt,
File "/content/edgeai-yolov5/utils/datasets.py", line 63, in create_dataloader
dataset = LoadImagesAndLabels(path, imgsz, batch_size,
File "/content/edgeai-yolov5/utils/datasets.py", line 414, in __init__
labels, shapes, self.segments = zip(*cache.values())
ValueError: not enough values to unpack (expected 3, got 0)
I think I solve problems with error labels require 56 columns each
by modifing utils/datasets.py
Everything about the 17 key points has been changed.
But I have an error again. Now It is about torch.cuda.OutOfMemoryError: CUDA out of memory
. This is strange, because somehow there was enough memory for 17 points, why is there not enough memory for 9 points?
github: ⚠️ WARNING: code is out of date by 465 commits. Use 'git pull' to update or 'git clone https://github.com/TexasInstruments/edgeai-yolov5' to download latest.
YOLOv5 � v4.0-76-gae4e0e8 torch 1.13.1+cu116 CUDA:0 (Tesla T4, 15109.875MB)
Namespace(adam=False, artifact_alias='latest', batch_size=64, bbox_interval=-1, bucket='', cache_images=False, cfg='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/models/hub/yolov5s6_kpts.yaml', data='/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/data.yaml', device='', entity=None, epochs=3, evolve=False, exist_ok=False, global_rank=-1, hyp='data/hyp.scratch.yaml', image_weights=False, img_size=[640, 640], kpt_label=True, label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='edgeai-yolov5', noautoanchor=False, nosave=False, notest=False, project='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', quad=False, rect=False, resume=False, save_dir='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg/edgeai-yolov523', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=64, upload_dataset=False, weights='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, kpt=0.1, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)
from n params module arguments
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 156928 models.common.C3 [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 625152 models.common.C3 [256, 256, 3]
7 -1 1 885504 models.common.Conv [256, 384, 3, 2]
8 -1 1 665856 models.common.C3 [384, 384, 1]
9 -1 1 1770496 models.common.Conv [384, 512, 3, 2]
[3, 5, 7]
10 -1 1 656896 models.common.SPP [512, 512, [3, 5, 7]]
11 -1 1 1182720 models.common.C3 [512, 512, 1, False]
12 -1 1 197376 models.common.Conv [512, 384, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 8] 1 0 models.common.Concat [1]
15 -1 1 813312 models.common.C3 [768, 384, 1, False]
16 -1 1 98816 models.common.Conv [384, 256, 1, 1]
17 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
18 [-1, 6] 1 0 models.common.Concat [1]
19 -1 1 361984 models.common.C3 [512, 256, 1, False]
20 -1 1 33024 models.common.Conv [256, 128, 1, 1]
21 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
22 [-1, 4] 1 0 models.common.Concat [1]
23 -1 1 90880 models.common.C3 [256, 128, 1, False]
24 -1 1 147712 models.common.Conv [128, 128, 3, 2]
25 [-1, 20] 1 0 models.common.Concat [1]
26 -1 1 296448 models.common.C3 [256, 256, 1, False]
27 -1 1 590336 models.common.Conv [256, 256, 3, 2]
28 [-1, 16] 1 0 models.common.Concat [1]
29 -1 1 715008 models.common.C3 [512, 384, 1, False]
30 -1 1 1327872 models.common.Conv [384, 384, 3, 2]
31 [-1, 12] 1 0 models.common.Concat [1]
32 -1 1 1313792 models.common.C3 [768, 512, 1, False]
33 [23, 26, 29, 32] 1 2681996 models.yolo.Detect [1, [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]], 9, [128, 256, 384, 512]]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Model Summary: 557 layers, 15022412 parameters, 15022412 gradients
Transferred 470/744 items from /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt
Scaled weight_decay = 0.0005
Optimizer groups: 129 .bias, 129 conv.weight, 121 other
train: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train' images and labels... 102 found, 0 missing, 0 empty, 0 corrupted: 100% 102/102 [00:00<00:00, 292.12it/s]
train: New cache created: /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train.cache
val: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/valid.cache' images and labels... 29 found, 0 missing, 0 empty, 0 corrupted: 100% 1/1 [00:00<?, ?it/s]
Plotting labels...
autoanchor: Analyzing anchors... anchors/target = 6.41, Best Possible Recall (BPR) = 1.0000
Image sizes 640 train, 640 test
Using 2 dataloader workers
Logging results to /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg/edgeai-yolov523
Starting training for 3 epochs...
Epoch gpu_mem box obj cls kpt kptv total labels img_size
0% 0/2 [00:06<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 550, in <module>
train(hyp, opt, device, tb_writer)
File "train.py", line 305, in train
pred = model(imgs) # forward
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/edgeai-yolov5/models/yolo.py", line 157, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "/content/edgeai-yolov5/models/yolo.py", line 188, in forward_once
x = m(x) # run
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/edgeai-yolov5/models/yolo.py", line 67, in forward
x[i] = torch.cat((self.m[i](x[i]), self.m_kpt[i](x[i])), axis=1)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/content/edgeai-yolov5/models/common.py", line 45, in forward
return self.act(self.bn(self.conv(x)))
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/batchnorm.py", line 171, in forward
return F.batch_norm(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 2450, in batch_norm
return torch.batch_norm(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 14.76 GiB total capacity; 13.41 GiB already allocated; 3.88 MiB free; 13.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
If I try to reduce --batch-size from 64 into 32:
!python train.py --data {data_location} --cfg {cfg_location} --weights {weights} --batch-size 32 --img 640 --kpt-label --project {my_project} --name edgeai-yolov5 --epochs 3 --hyp {hyper_parameters}
I get an another error (same for --batch-size 32, --batch-size 15, --batch-size 1) >> RuntimeError: The size of tensor a (25) must match the size of tensor b (41) at non-singleton dimension 2
:
github: ⚠️ WARNING: code is out of date by 465 commits. Use 'git pull' to update or 'git clone https://github.com/TexasInstruments/edgeai-yolov5' to download latest.
YOLOv5 � v4.0-76-gae4e0e8 torch 1.9.0+cu102 CUDA:0 (Tesla T4, 15109.875MB)
Namespace(adam=False, artifact_alias='latest', batch_size=32, bbox_interval=-1, bucket='', cache_images=False, cfg='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/models/hub/yolov5s6_kpts.yaml', data='/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/data.yaml', device='', entity=None, epochs=3, evolve=False, exist_ok=False, global_rank=-1, hyp='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/data/hyp.scratch.yaml', image_weights=False, img_size=[640, 640], kpt_label=True, label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='edgeai-yolov5', noautoanchor=False, nosave=False, notest=False, project='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', quad=False, rect=False, resume=False, save_dir='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg/edgeai-yolov528', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=32, upload_dataset=False, weights='/content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, kpt=0.1, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)
from n params module arguments
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 18560 models.common.Conv [32, 64, 3, 2]
2 -1 1 18816 models.common.C3 [64, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 156928 models.common.C3 [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 625152 models.common.C3 [256, 256, 3]
7 -1 1 885504 models.common.Conv [256, 384, 3, 2]
8 -1 1 665856 models.common.C3 [384, 384, 1]
9 -1 1 1770496 models.common.Conv [384, 512, 3, 2]
[3, 5, 7]
10 -1 1 656896 models.common.SPP [512, 512, [3, 5, 7]]
11 -1 1 1182720 models.common.C3 [512, 512, 1, False]
12 -1 1 197376 models.common.Conv [512, 384, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 8] 1 0 models.common.Concat [1]
15 -1 1 813312 models.common.C3 [768, 384, 1, False]
16 -1 1 98816 models.common.Conv [384, 256, 1, 1]
17 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
18 [-1, 6] 1 0 models.common.Concat [1]
19 -1 1 361984 models.common.C3 [512, 256, 1, False]
20 -1 1 33024 models.common.Conv [256, 128, 1, 1]
21 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
22 [-1, 4] 1 0 models.common.Concat [1]
23 -1 1 90880 models.common.C3 [256, 128, 1, False]
24 -1 1 147712 models.common.Conv [128, 128, 3, 2]
25 [-1, 20] 1 0 models.common.Concat [1]
26 -1 1 296448 models.common.C3 [256, 256, 1, False]
27 -1 1 590336 models.common.Conv [256, 256, 3, 2]
28 [-1, 16] 1 0 models.common.Concat [1]
29 -1 1 715008 models.common.C3 [512, 384, 1, False]
30 -1 1 1327872 models.common.Conv [384, 384, 3, 2]
31 [-1, 12] 1 0 models.common.Concat [1]
32 -1 1 1313792 models.common.C3 [768, 512, 1, False]
33 [23, 26, 29, 32] 1 2681996 models.yolo.Detect [1, [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]], 9, [128, 256, 384, 512]]
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
Model Summary: 557 layers, 15022412 parameters, 15022412 gradients
Transferred 470/744 items from /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/weights/person_detector_yolov5s6_960_71p6_93p1/last.pt
Scaled weight_decay = 0.0005
Optimizer groups: 129 .bias, 129 conv.weight, 121 other
train: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train' images and labels... 102 found, 0 missing, 0 empty, 0 corrupted: 100% 102/102 [00:00<00:00, 342.49it/s]
train: New cache created: /content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/train.cache
val: Scanning '/content/drive/MyDrive/cv_tn/tn_keypoints_dataset/labels/valid.cache' images and labels... 29 found, 0 missing, 0 empty, 0 corrupted: 100% 1/1 [00:00<?, ?it/s]
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
[W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool)
Plotting labels...
autoanchor: Analyzing anchors... anchors/target = 6.41, Best Possible Recall (BPR) = 1.0000
Image sizes 640 train, 640 test
Using 2 dataloader workers
Logging results to /content/drive/MyDrive/cv_tn/TexasInstruments_edgeai-yolov5_tree_yolo-pose/train-seg/edgeai-yolov528
Starting training for 3 epochs...
Epoch gpu_mem box obj cls kpt kptv total labels img_size
0% 0/4 [00:02<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 550, in <module>
train(hyp, opt, device, tb_writer)
File "train.py", line 306, in train
loss, loss_items = compute_loss(pred, targets.to(device)) # loss scaled by batch_size
File "/content/edgeai-yolov5/utils/loss.py", line 120, in __call__
tcls, tbox, tkpt, indices, anchors = self.build_targets(p, targets) # targets
File "/content/edgeai-yolov5/utils/loss.py", line 207, in build_targets
t = targets * gain
RuntimeError: The size of tensor a (25) must match the size of tensor b (41) at non-singleton dimension 2
- If I try not set OMP_NUM_THREAD:
I get same error torch.cuda.OutOfMemoryError: CUDA out of memory.
- If I try to reduce memory load by reducing of --batch-size 64 into --batch-size 32 or --batch-size 15 or --batch-size 1
I get an error:
RuntimeError: The size of tensor a (25) must match the size of tensor b (41) at non-singleton dimension 2
hi! I want to know that which function you use in utils/datasets.py to slove the error (labels require 56 columns each),thanks!
Everything about the 17 key points has been changed in utils/datasets.py.
Hello, I would like to know how you resolved this bug: labels, shapes, self.segments = zip(*cache.values()) ValueError: not enough values to unpack (expected 3, got 0) Why are all the training set images corrupted? thank you very much