one-yolov5
one-yolov5 copied to clipboard
add build_targets_optim
link https://github.com/Oneflow-Inc/oneflow/pull/9536
精度验证结果正常 在20个epochs下精度差距 - 0.44599999999999795 ,具体数据如下:
启动指令: python -m oneflow.distributed.launch --nproc_per_node 4 train.py --data data/coco.yaml --weights ' ' --cfg models/yolov5n.yaml --batch 128 --bbox_iou_optim --multi_tensor_optimizer --build_targets_optim --device 4,5,6,7 --epochs 20
epoch | batch | gpu | bbox_iou_optim | multi_tensor_optimizer | [email protected] | [email protected]:.95 | |
---|---|---|---|---|---|---|---|
baseline | 19 | 128 | 4 | False | False | 31.198999999999998 | 17.407 |
本次实验 | 19 | 128 | 4 | True | True | 30.705 | 16.961000000000002 |
本次-baseline | 19 | -0.4939999999999998 | -0.44599999999999795 |
python -m oneflow.distributed.launch --nproc_per_node 4 train.py --data data/coco.yaml --weights ' ' --cfg models/yolov5n.yaml --batch 128 --bbox_iou_optim --multi_tensor_optimizer --build_targets_optim --device 4,5,6,7 --epochs 20
全部result.csv
epoch, train/box_loss, train/obj_loss, train/cls_loss, metrics/precision, metrics/recall, metrics/mAP_0.5,metrics/mAP_0.5:0.95, val/box_loss, val/obj_loss, val/cls_loss, x/lr0, x/lr1, x/lr2
0, 0.093182, 0.076606, 0.083707, 0.0034488, 0.068622, 0.0041244, 0.00135, 0.082852, 0.057275, 0.079476, 0.070032, 0.0033297, 0.0033297
1, 0.076017, 0.077694, 0.076859, 0.61515, 0.02394, 0.016233, 0.0062205, 0.074792, 0.05561, 0.069954, 0.039703, 0.0063332, 0.0063332
2, 0.070362, 0.075674, 0.067088, 0.45244, 0.075191, 0.04722, 0.019736, 0.068904, 0.057591, 0.058687, 0.0090428, 0.0090068, 0.0090068
3, 0.066933, 0.076041, 0.058288, 0.37677, 0.12383, 0.092613, 0.040804, 0.064703, 0.054779, 0.050293, 0.008515, 0.008515, 0.008515
4, 0.064511, 0.074846, 0.052813, 0.39496, 0.17134, 0.13238, 0.063082, 0.062339, 0.054191, 0.044835, 0.00802, 0.00802, 0.00802
5, 0.063179, 0.073648, 0.049642, 0.36816, 0.19597, 0.16268, 0.079857, 0.060779, 0.053623, 0.041664, 0.007525, 0.007525, 0.007525
6, 0.062163, 0.072918, 0.04724, 0.33008, 0.21392, 0.18523, 0.093926, 0.059704, 0.053372, 0.039742, 0.00703, 0.00703, 0.00703
7, 0.061349, 0.073734, 0.045705, 0.38091, 0.23731, 0.20934, 0.10881, 0.058611, 0.053002, 0.037711, 0.006535, 0.006535, 0.006535
8, 0.060651, 0.073988, 0.044348, 0.38831, 0.2466, 0.22479, 0.11731, 0.05803, 0.052564, 0.036483, 0.00604, 0.00604, 0.00604
9, 0.059962, 0.073595, 0.043254, 0.41124, 0.26339, 0.24331, 0.12977, 0.05743, 0.052763, 0.035348, 0.005545, 0.005545, 0.005545
10, 0.059417, 0.072536, 0.042297, 0.41421, 0.27602, 0.25506, 0.13644, 0.057048, 0.052061, 0.034245, 0.00505, 0.00505, 0.00505
11, 0.058873, 0.072241, 0.041424, 0.39929, 0.28124, 0.26624, 0.14363, 0.0565, 0.052592, 0.033551, 0.004555, 0.004555, 0.004555
12, 0.058494, 0.071352, 0.040809, 0.44046, 0.28721, 0.27507, 0.14928, 0.056286, 0.051716, 0.03283, 0.00406, 0.00406, 0.00406
13, 0.057861, 0.072136, 0.039713, 0.42786, 0.29419, 0.28141, 0.1534, 0.056102, 0.051525, 0.03227, 0.003565, 0.003565, 0.003565
14, 0.057516, 0.072301, 0.038914, 0.45416, 0.29648, 0.28761, 0.15725, 0.055784, 0.051387, 0.031714, 0.00307, 0.00307, 0.00307
15, 0.056954, 0.071751, 0.038346, 0.45428, 0.30127, 0.29314, 0.16077, 0.055604, 0.051567, 0.031298, 0.002575, 0.002575, 0.002575
16, 0.05655, 0.071426, 0.03797, 0.4524, 0.30718, 0.29784, 0.16381, 0.055435, 0.051677, 0.030979, 0.00208, 0.00208, 0.00208
17, 0.056053, 0.071739, 0.037025, 0.44086, 0.31075, 0.30175, 0.1663, 0.055263, 0.051571, 0.030707, 0.001585, 0.001585, 0.001585
18, 0.055577, 0.068904, 0.036529, 0.44778, 0.31155, 0.30443, 0.16825, 0.055255, 0.051014, 0.030517, 0.00109, 0.00109, 0.00109
19, 0.054898, 0.07117, 0.035509, 0.45353, 0.31224, 0.30705, 0.16961, 0.055114, 0.051115, 0.030329, 0.000595, 0.000595, 0.000595
可以再跑一次,我感觉0.5好像差得有点多不知道是不是bug。 @ccssu
可以再跑一次,我感觉0.5好像差得有点多不知道是不是bug。 @ccssu
精度验证结果大致正常 在 300 个epochs下精度差距 -0.42499999999999716 ,具体数据如下:
启动指令: python -m oneflow.distributed.launch --nproc_per_node 4 train.py --data data/coco.yaml --weights ' ' --cfg models/yolov5n.yaml --batch 128 --bbox_iou_optim --multi_tensor_optimizer --build_targets_optim --device 4,5,6,7 --epochs 300
epoch | batch | gpu | bbox_iou_optim | multi_tensor_optimizer | [email protected] | [email protected]:.95 | |
---|---|---|---|---|---|---|---|
baseline | 299 | 128 | 4 | False | False | 45.115 | 27.431 |
本次实验 | 299 | 128 | 4 | True | True | 44.373000000000005 | 27.006000000000004 |
本次-baseline | 299 | -0.7419999999999973 | -0.42499999999999716 |
python -m oneflow.distributed.launch --nproc_per_node 4 train.py --data data/coco.yaml --weights ' ' --cfg models/yolov5n.yaml --batch 128 --bbox_iou_optim --multi_tensor_optimizer --build_targets_optim --device 4,5,6,7
oneflow版本: f59f6dacbe (HEAD -> fused_get_target_offsets, 机器a100 wandb数据:https://wandb.ai/wearmheart/YOLOv5/runs/3l3ku6me?workspace=user-wearmheart 在 300 个epochs下精度差距 -0.42499999999999716
启动指令: python -m oneflow.distributed.launch --nproc_per_node 4 train.py --data data/coco.yaml --weights ' ' --cfg models/yolov5n.yaml --batch 128 --bbox_iou_optim --multi_tensor_optimizer --build_targets_optim --device 4,5,6,7 --epochs 300
epoch | batch | gpu | bbox_iou_optim | multi_tensor_optimizer | [email protected] | [email protected]:.95 | |
---|---|---|---|---|---|---|---|
baseline | 299 | 128 | 4 | False | False | 45.115 | 27.431 |
本次实验 | 299 | 128 | 4 | True | True | 44.468 | 26.825 |
本次-baseline | 299 | -0.6469999999999985 | -0.6060000000000016 |
python -m oneflow.distributed.launch --nproc_per_node 4 train.py --data data/coco.yaml --weights ' ' --cfg models/yolov5n.yaml --batch 128 --bbox_iou_optim --multi_tensor_optimizer --build_targets_optim --device 4,5,6,7
baseline与current实验对照趋势图