PyTorch_YOLOv4 coco2017 training results map only 35.7

coco2017 training results map only 35.7

Open supeng0924 opened this issue 3 years ago • 23 comments

I use default parameters, training with command as follows: python train.py --device 4 --batch-size 16 --img 512 512 --data coco.yaml --cfg cfg/yolov4.cfg --weights '' --name yolov4-pacsp

Using CUDA device0 _CudaDeviceProperties(name='Tesla V100-SXM2-32GB', total_memory=32480MB)
Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='cfg/yolov4.cfg', data='./data/coco.yaml', device='4', epochs=300, evolve=False, global_rank=-1, hyp='data/hyp.scratch.yaml', img_size=[512, 512], local_rank=-1, logdir='runs/', multi_scale=False, name='yolov4-pacsp', noautoanchor=False, nosave=False, notest=False, rect=False, resume=False, single_cls=False, sync_bn=False, total_batch_size=16, weights='', world_size=1) Start Tensorboard with "tensorboard --logdir runs/", view at http://localhost:6006/
Hyperparameters {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.5, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.015, 'hsv_s': 0.7, 'hsv_v': 0.4, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0, 'perspective': 0.0, 'flipud': 0.0, 'fliplr': 0.5, 'mixup': 0.0} Model Summary: 327 layers, 6.43631e+07 parameters, 6.43631e+07 gradients, 142.8 GFLOPS Optimizer groups: 110 .bias, 110 conv.weight, 107 other

log: 0/299 16.2G 0.07564 0.1414 0.07705 0.2941 176 512 0.08018 0.04417 0.03022 0.01233 0.07018 0.1006 0.06409 1/299 24.5G 0.06105 0.1347 0.05651 0.2523 238 512 0.181 0.1989 0.1242 0.05767 0.06088 0.09591 0.04571 2/299 24.5G 0.05685 0.1298 0.0461 0.2327 258 512 0.2109 0.3024 0.2009 0.1022 0.05672 0.0931 0.03762 3/299 24.5G 0.05361 0.1261 0.04016 0.2199 217 512 0.2296 0.3767 0.2647 0.1398 0.05396 0.0909 0.03271 4/299 24.5G 0.05138 0.1233 0.03632 0.211 225 512 0.2365 0.4161 0.3034 0.1645 0.05229 0.08974 0.0299 5/299 24.5G 0.04991 0.1215 0.03396 0.2054 206 512 0.249 0.4353 0.324 0.1777 0.05133 0.08885 0.02838 6/299 24.5G 0.04893 0.1202 0.03219 0.2013 233 512 0.2605 0.4441 0.3377 0.1867 0.05071 0.08814 0.02744

Jan 04 '21 15:01 supeng0924

could you provide results.txt? by the way, default setting training with --img 640 640.

Jan 04 '21 22:01 WongKinYiu

Thank you for your reply. This is my training log: results.txt I find the anchor size in yolov4.cfg is [12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401]. The size is same to Darknet 512, so I set 512

Jan 05 '21 01:01 supeng0924

original yolov4 trained with darknet using multi-scale training. new code use jitters, so the resolution setting is changed to 640.

if you could, try to train again in nvcr.io/nvidia/pytorch:20.08-py3 docker environment.

Jan 05 '21 13:01 WongKinYiu

Thanks. I will set image size to 640 and multi-scale on, then train again both in my docker and nvcr.io/nvidia/pytorch:20.08-py3 docker.

Jan 05 '21 14:01 supeng0924

you could use your previous training command. python train.py --device 4 --batch-size 16 --img 512 512 --data coco.yaml --cfg cfg/yolov4.cfg --weights '' --name yolov4 it seems some version of pytorch/cuda/cudnn will produce weird results.

Jan 05 '21 14:01 WongKinYiu

you could use your previous training command. python train.py --device 4 --batch-size 16 --img 512 512 --data coco.yaml --cfg cfg/yolov4.cfg --weights '' --name yolov4 it seems some version of pytorch/cuda/cudnn will produce weird results.

The docker you provide need nvidia driver>450. I can't do this experiment. Could you provide your results.txt? I want to compare with yours results.txt to find whether could position problem.

Jan 08 '21 09:01 supeng0924

I just train with 640 640 for a while.

     0/299     12.7G   0.08222   0.08686   0.08028    0.2494        23       640   0.05106   0.01807   0.02296  0.009321   0.06775    0.0715   0.06805
     1/299     12.7G   0.06487   0.08444   0.06221    0.2115         7       640    0.2476    0.1556    0.1292   0.06344   0.05693   0.06702    0.0484
     2/299     12.7G   0.05953   0.08114   0.04974    0.1904        22       640    0.2728    0.2982    0.2332    0.1259   0.05127   0.06413   0.03756
     3/299     12.7G   0.05525   0.07849   0.04187    0.1756        28       640    0.2718    0.4086    0.3207    0.1843   0.04774   0.06168   0.03087
     4/299     12.7G   0.05233   0.07645   0.03687    0.1657         3       640    0.2903    0.4619    0.3705    0.2204   0.04568   0.06018   0.02754
     5/299     12.7G   0.05058   0.07531   0.03403    0.1599        26       640    0.3089    0.4882    0.4008    0.2433   0.04445   0.05919   0.02568
     6/299     12.7G   0.04944   0.07431   0.03213    0.1559        36       640    0.3258    0.5029    0.4183    0.2574   0.04372   0.05857   0.02457
     7/299     12.7G   0.04845   0.07338   0.03078    0.1526        17       640    0.3411    0.5111    0.4305    0.2667    0.0432   0.05812   0.02382
     8/299     12.7G   0.04783   0.07309   0.02977    0.1507        49       640    0.3536    0.5148    0.4404    0.2748    0.0428   0.05776   0.02325
     9/299     12.7G    0.0473   0.07247    0.0289    0.1487         7       640    0.3683    0.5167    0.4481    0.2809   0.04248    0.0575   0.02278
    10/299     12.7G   0.04672   0.07187   0.02804    0.1466        16       640    0.3858    0.5174    0.4567    0.2874   0.04219    0.0573   0.02235
    11/299     12.7G   0.04633   0.07169   0.02751    0.1455        24       640    0.3982    0.5174    0.4636     0.293   0.04191   0.05716   0.02194
    12/299     12.7G    0.0461   0.07153   0.02709    0.1447        17       640    0.4137    0.5181    0.4697    0.2983   0.04165   0.05706   0.02156
    13/299     12.7G   0.04576   0.07135   0.02663    0.1437        18       640    0.4257    0.5187    0.4759    0.3035   0.04139   0.05696   0.02119
    14/299     12.7G   0.04544   0.07097   0.02609    0.1425         8       640    0.4356    0.5198    0.4828     0.309   0.04112   0.05686    0.0208
    15/299     12.7G    0.0452   0.07079   0.02577    0.1418        45       640    0.4444     0.522    0.4901    0.3149   0.04086   0.05675   0.02042
    16/299     12.7G   0.04496   0.07044   0.02554    0.1409        19       640     0.453    0.5272    0.4977    0.3205   0.04058   0.05661   0.02006
    17/299     12.7G   0.04482   0.07046   0.02525    0.1405        31       640    0.4575    0.5327    0.5041    0.3261   0.04031   0.05644   0.01969
    18/299     12.7G   0.04454   0.06976   0.02495    0.1393        12       640    0.4616    0.5389    0.5106    0.3312   0.04004   0.05624   0.01933
    19/299     12.7G   0.04443   0.06981   0.02479     0.139        40       640     0.465    0.5468    0.5173    0.3361   0.03977   0.05601     0.019
    20/299     12.7G   0.04433   0.06977   0.02473    0.1388        27       640    0.4653    0.5528    0.5233     0.341    0.0395   0.05577   0.01869

Jan 13 '21 06:01 WongKinYiu

Thank you for your reply. Did you set batch-size=8 and multi-scale off ?

Jan 13 '21 09:01 supeng0924

Thank you for your reply. Did you set batch-size=8 and multi-scale off ? 您好，我遇到了和您一样的问题，在coco2017上512的img_size精度只有36左右，请问您解决了么~

Jan 25 '21 09:01 sherry085

Thank you for your reply. Did you set batch-size=8 and multi-scale off ? 您好，我遇到了和您一样的问题，在coco2017上512的img_size精度只有36左右，请问您解决了么~

目前没有，正在训练YOLOv4pacsp-x-mish这个config，目前147个epoch map能达到39.8

Jan 25 '21 11:01 supeng0924

i set batch-size=16 and multi-scale off in https://github.com/WongKinYiu/PyTorch_YOLOv4/issues/232#issuecomment-759227582

and i try to train yolov4s with 512x512 on a 2080ti, it can reach 32+ AP.

Jan 25 '21 12:01 WongKinYiu

Thank you for your reply. Did you set batch-size=8 and multi-scale off ? 您好，我遇到了和您一样的问题，在coco2017上512的img_size精度只有36左右，请问您解决了么~

目前没有，正在训练YOLOv4pacsp-x-mish这个config，目前147个epoch map能达到39.8

同样的环境和代码，只是替换网络么~

Jan 25 '21 12:01 sherry085

您好，谢谢您的回答，您有换环境么，对于目前的结果，我感到很疑惑，精度比论文差很多，观察曲线看出，从很早开始精度就以很小的速度在增长了。我们参考的源码在640下的增长曲线是很正常的，只是替换尺寸，感觉这个结果差太多了。

------------------ 原始邮件 ------------------ 发件人: "WongKinYiu/PyTorch_YOLOv4" <[email protected]>; 发送时间: 2021年1月25日(星期一) 晚上8:40 收件人: "WongKinYiu/PyTorch_YOLOv4"<[email protected]>; 抄送: "1208695936"<[email protected]>;"Comment"<[email protected]>; 主题: Re: [WongKinYiu/PyTorch_YOLOv4] coco2017 training results map only 35.7 (#232)

i set batch-size=16 and multi-scale off in #232 (comment)

and i try to train yolov4s with 512x512 on a 2080ti, it can reach 32+ AP.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Jan 25 '21 12:01 sherry085

您好，谢谢您的回答，您有换环境么，对于目前的结果，我感到很疑惑，精度比论文差很多，观察曲线看出，从很早开始精度就以很小的速度在增长了。我们参考的源码在640下的增长曲线是很正常的，只是替换尺寸，感觉这个结果差太多了。 … ------------------ 原始邮件 ------------------ 发件人: "WongKinYiu/PyTorch_YOLOv4" <[email protected]>; 发送时间: 2021年1月25日(星期一) 晚上8:40 收件人: "WongKinYiu/PyTorch_YOLOv4"<[email protected]>; 抄送: "1208695936"<[email protected]>;"Comment"<[email protected]>; 主题: Re: [WongKinYiu/PyTorch_YOLOv4] coco2017 training results map only 35.7 (#232) i set batch-size=16 and multi-scale off in #232 (comment) and i try to train yolov4s with 512x512 on a 2080ti, it can reach 32+ AP. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

没有换环境，只是换了map是50.0的pacsp-x-mish config，不确定最终能否训练到50.0

Jan 25 '21 14:01 supeng0924

我在147 epoch 640的ap是45.2

Jan 25 '21 15:01 WongKinYiu

我在147 epoch 640的ap是45.2 您好，我在pytorch1.6和1.7.0、1.7.1下分别尝试，img_size为512时map最终只能到36左右，但是您和supeng0924在640下都能训得比较好的结果，请问是什么原因呢？

Jan 26 '21 01:01 sherry085

我在147 epoch 640的ap是45.2 您好，我在pytorch1.6和1.7.0、1.7.1下分别尝试，img_size为512时map最终只能到36左右，但是您和supeng0924在640下都能训得比较好的结果，请问是什么原因呢？

作者之前提供的docker环境是CUDA11.1，而我是用CUDA10.1+pytorch1.6做的训练，不确定是否是这块会产生这么大的影响。而且我目前还没有CUDA11.1的环境，不知道你是否有CUDA11.1的环境做这个验证。

Jan 26 '21 01:01 supeng0924

我在147 epoch 640的ap是45.2 您好，我在pytorch1.6和1.7.0、1.7.1下分别尝试，img_size为512时map最终只能到36左右，但是您和supeng0924在640下都能训得比较好的结果，请问是什么原因呢？

作者之前提供的docker环境是CUDA11.1，而我是用CUDA10.1+pytorch1.6做的训练，不确定是否是这块会产生这么大的影响。而且我目前还没有CUDA11.1的环境，不知道你是否有CUDA11.1的环境做这个验证。您好，您可以提供一份docker环境的list么，我这边已经在cuda11.0+pytoch1.7.1下验证了，目前165轮0.335，而且从趋势看增长很慢，后期差不多也只能到36左右~

Jan 26 '21 01:01 sherry085

我測試過沒問題的幾個環境 nvcr.io/nvidia/pytorch:20.02-py3 nvcr.io/nvidia/pytorch:20.03-py3 nvcr.io/nvidia/pytorch:20.06-py3 nvcr.io/nvidia/pytorch:20.08-py3

Jan 26 '21 01:01 WongKinYiu

您好，您有尝试过训512的么~

Jan 26 '21 01:01 sherry085

512 yolov4-pacsp-s-mish

Jan 26 '21 01:01 WongKinYiu

512 yolov4-pacsp-s-mish 您好，您有yolov4在coco2017上512尺度下的结果么，我基于您的源码训了多次，在不同环境下，精度最后都在36左右，现在比较困惑是环境的问题，还是目前的参数配置不是很适用于512的尺度。

Jan 26 '21 02:01 sherry085

512 yolov4-pacsp-s-mish 您好，您有yolov4在coco2017上512尺度下的结果么，我基于您的源码训了多次，在不同环境下，精度最后都在36左右，现在比较困惑是环境的问题，还是目前的参数配置不是很适用于512的尺度。

您好，我现在遇到跟您之前一样的问题也是512尺度下map只能到36%，请问您解决了吗？

Jun 01 '21 07:06 klk2020

PyTorch_YOLOv4 PyTorch_YOLOv4 copied to clipboard

coco2017 training results map only 35.7

PyTorch_YOLOv4
PyTorch_YOLOv4 copied to clipboard