YOLO-World
YOLO-World copied to clipboard
yolo-wolrd-s 在 上objects365 复现失败
Hi, 我配置环境进行yolo-world-s复现, 1)环境配置验证:可以使用提供的checkpoint在lvis上zeroshot达到和官方一致的mAP结果 2)复现配置:仅仅使用objects365v1的数据和train.json obj365v1_train_dataset = dict( type='MultiModalDataset', dataset=dict( type='YOLOv5Objects365V1Dataset', data_root='/my_path/datasets/objects365v1/', ann_file='/my_path/datasets/objects365v1/annotations/train.json', data_prefix=dict(img='train/'), filter_cfg=dict(filter_empty_gt=False, min_size=32)), class_text_path='data/texts/obj365v1_class_texts.json', pipeline=train_pipeline)
train_dataloader = dict(batch_size=train_batch_size_per_gpu, collate_fn=dict(type='yolow_collate'), dataset=dict(delete=True, type='ConcatDataset', datasets=[ obj365v1_train_dataset ], ignore_keys=['classes', 'palette'])) 3) 超参数:使用pretrain中的参数,使用8V100x16bs 4) 训练过程:loss保持偏高,infer不到正确结果 2024/02/11 07:05:26 - mmengine - INFO - Epoch(train) [100][4300/4755] base_lr: 2.0000e-03 lr: 5.9600e-05 eta: 0:05:38 time: 0.7859 data_time: 0.0060 memory: 8815 grad_norm: 492.3932 loss: 433.7005 loss_cls: 160.5437 loss_bbox: 137.8297 loss_dfl: 135.3271 2024/02/11 07:06:03 - mmengine - INFO - Epoch(train) [100][4350/4755] base_lr: 2.0000e-03 lr: 5.9600e-05 eta: 0:05:01 time: 0.7353 data_time: 0.0059 memory: 9002 grad_norm: 491.6336 loss: 436.9689 loss_cls: 162.9173 loss_bbox: 138.7117 loss_dfl: 135.3400 2024/02/11 07:06:42 - mmengine - INFO - Epoch(train) [100][4400/4755] base_lr: 2.0000e-03 lr: 5.9600e-05 eta: 0:04:23 time: 0.7874 data_time: 0.0062 memory: 8922 grad_norm: 480.4001 loss: 437.0023 loss_cls: 161.6231 loss_bbox: 139.2435 loss_dfl: 136.1358 2024/02/11 07:07:20 - mmengine - INFO - Epoch(train) [100][4450/4755] base_lr: 2.0000e-03 lr: 5.9600e-05 eta: 0:03:46 time: 0.7536 data_time: 0.0060 memory: 8615 grad_norm: 487.9105 loss: 439.4708 loss_cls: 166.0740 loss_bbox: 138.1573 loss_dfl: 135.2395
请问有任何复现的建议吗?
@lluo-Desktop, 您好,中间有评测结果吗?可以先看看5个epoch的评测结果。另外我们还没有在8x16的setting下(默认采用的32x16)用目前这套setting训练,我们可以帮您验证一下训练。
@wondervictor 感谢你的回复!
这是在objects365上进行pretrain的,load_from官方版本,2epoch的train/val log:
2024/02/20 03:38:33 - mmengine - INFO - Epoch(train) [2][4700/4755] base_lr: 5.0000e-04 lr: 3.2809e-04 eta: 2 days, 23:09:22 time: 0.5640 data_time: 0.0045 memory: 11813 grad_norm: 848.5867 loss: 502.3191 loss_cls: 201.0484 loss_bbox: 155.8162 loss_dfl: 145.4545
2024/02/20 03:39:03 - mmengine - INFO - Epoch(train) [2][4750/4755] base_lr: 5.0000e-04 lr: 3.2983e-04 eta: 2 days, 23:10:32 time: 0.5896 data_time: 0.0044 memory: 8960 grad_norm: 925.2765 loss: 509.8672 loss_cls: 204.1596 loss_bbox: 157.7191 loss_dfl: 147.9884
2024/02/20 03:39:04 - mmengine - INFO - Exp name: exp1.3_yolo_world_s_8nx16bs_obj365v1_20240220_020931
2024/02/20 03:39:04 - mmengine - INFO - Saving checkpoint at 2 epochs
2024/02/20 03:39:08 - mmengine - WARNING - save_param_scheduler is True but self.param_schedulers is None, so skip saving parameter schedulers
2024/02/20 03:39:18 - mmengine - INFO - Epoch(val) [2][ 50/602] eta: 0:01:28 time: 0.1608 data_time: 0.0074 memory: 9160
...
2024/02/20 03:40:33 - mmengine - INFO - Epoch(val) [2][600/602] eta: 0:00:00 time: 0.1382 data_time: 0.0002 memory: 924
2024/02/20 03:40:56 - mmengine - INFO - Evaluating bbox...
2024/02/20 03:42:21 - mmengine - INFO - Epoch(val) [2][602/602] lvis/bbox_AP: 0.0070 lvis/bbox_AP50: 0.0080 lvis/bbox_AP75: 0.0070 lvis/bbox_APs: 0.0020 lvis/bbox_APm: 0.0070 lvis/bbox_APl: 0.0200 lvis/bbox_APr: 0.0000 lvis/bbox_APc: 0.0080 lvis/bbox_APf: 0.0070 data_time: 0.0009 time: 0.1393
今天尝试finetune on coco,loss和val表现是正常的,8V100x16bs,lr=2e-4,,下面是部分train/val log:
...
2024/02/20 07:38:33 - mmengine - INFO - Epoch(train) [2][900/925] base_lr: 2.0000e-04 lr: 1.2983e-04 eta: 9:42:33 time: 0.4756 data_time: 0.0047 memory: 8196 grad_norm: 1069.7471 loss: 428.3111 loss_cls: 150.2849 loss_bbox: 130.5900 loss_dfl: 147.4362
2024/02/20 07:38:45 - mmengine - INFO - Exp name: yolo_world_s_dual_vlpan_2e-4_80e_8gpus_finetune_coco_20240220_072120
2024/02/20 07:38:45 - mmengine - INFO - Saving checkpoint at 2 epochs
2024/02/20 07:38:48 - mmengine - WARNING - save_param_scheduler is True but self.param_schedulers is None, so skip saving parameter schedulers
2024/02/20 07:38:51 - mmengine - INFO - Epoch(val) [2][ 50/625] eta: 0:00:09 time: 0.0164 data_time: 0.0006 memory: 8076
...
2024/02/20 07:38:59 - mmengine - INFO - Epoch(val) [2][600/625] eta: 0:00:00 time: 0.0144 data_time: 0.0002 memory: 850
2024/02/20 07:39:11 - mmengine - INFO - Evaluating bbox...
2024/02/20 07:40:17 - mmengine - INFO - bbox_mAP_copypaste: 0.394 0.548 0.432 0.220 0.427 0.526
稍等,我这边试试这个setting。
稍等,我这边试试这个setting。
请问下你们的training log提供在哪里呢,貌似没有找着,谢谢
@lluo-Desktop 您好,我目前按照objects365v1, 8x16bs的setting训练了一下YOLO-World-S,前期的log如下:
02/22 17:49:39 - mmengine - INFO - Epoch(train) [1][ 50/4755] lr: 6.8700e-06 eta: 6 days, 20:34:55 time: 1.2462 data_time: 0.5570 memory: 13214 grad_norm: nan loss: 1776.3508 loss_cls: 733.2614 loss_bbox: 500.5636 loss_dfl: 542.5258
02/22 17:50:15 - mmengine - INFO - Epoch(train) [1][ 100/4755] lr: 1.3880e-05 eta: 5 days, 9:40:25 time: 0.7177 data_time: 0.3736 memory: 7009 grad_norm: 1095.3765 loss: 1732.4874 loss_cls: 692.1060 loss_bbox: 502.5964 loss_dfl: 537.7850
02/22 17:50:47 - mmengine - INFO - Epoch(train) [1][ 150/4755] lr: 2.0890e-05 eta: 4 days, 19:04:21 time: 0.6505 data_time: 0.2494 memory: 6462 grad_norm: 926.0114 loss: 1685.0875 loss_cls: 646.4953 loss_bbox: 506.1760 loss_dfl: 532.4163
02/22 17:51:17 - mmengine - INFO - Epoch(train) [1][ 200/4755] lr: 2.7900e-05 eta: 4 days, 10:01:39 time: 0.5978 data_time: 0.2741 memory: 5902 grad_norm: 731.0350 loss: 1621.9136 loss_cls: 603.8184 loss_bbox: 499.2010 loss_dfl: 518.8943
02/22 17:51:48 - mmengine - INFO - Epoch(train) [1][ 250/4755] lr: 3.4911e-05 eta: 4 days, 5:00:20 time: 0.6133 data_time: 0.2042 memory: 7715 grad_norm: 661.7663 loss: 1568.2445 loss_cls: 580.4432 loss_bbox: 487.5858 loss_dfl: 500.2155
02/22 17:52:18 - mmengine - INFO - Epoch(train) [1][ 300/4755] lr: 4.1921e-05 eta: 4 days, 1:18:26 time: 0.5975 data_time: 0.1717 memory: 6475 grad_norm: 725.5072 loss: 1511.0823 loss_cls: 564.0430 loss_bbox: 461.9295 loss_dfl: 485.1097
02/22 17:52:42 - mmengine - INFO - Epoch(train) [1][ 350/4755] lr: 4.8931e-05 eta: 3 days, 20:33:57 time: 0.4862 data_time: 0.1077 memory: 5902 grad_norm: inf loss: 1468.3130 loss_cls: 555.9068 loss_bbox: 445.7742 loss_dfl: 466.6320
02/22 17:53:10 - mmengine - INFO - Epoch(train) [1][ 400/4755] lr: 5.5941e-05 eta: 3 days, 18:06:37 time: 0.5531 data_time: 0.0768 memory: 6795 grad_norm: 748.6951 loss: 1396.6288 loss_cls: 536.2165 loss_bbox: 415.3218 loss_dfl: 445.0905
02/22 17:53:37 - mmengine - INFO - Epoch(train) [1][ 450/4755] lr: 6.2951e-05 eta: 3 days, 16:04:32 time: 0.5447 data_time: 0.0875 memory: 7235 grad_norm: 732.4562 loss: 1345.2445 loss_cls: 530.3352 loss_bbox: 390.0545 loss_dfl: 424.8548
02/22 17:54:04 - mmengine - INFO - Epoch(train) [1][ 500/4755] lr: 6.9961e-05 eta: 3 days, 14:24:34 time: 0.5419 data_time: 0.0745 memory: 6329 grad_norm: 713.2585 loss: 1280.7532 loss_cls: 512.5199 loss_bbox: 368.0122 loss_dfl: 400.2211
02/22 17:54:32 - mmengine - INFO - Epoch(train) [1][ 550/4755] lr: 7.6972e-05 eta: 3 days, 13:15:44 time: 0.5600 data_time: 0.0562 memory: 7169 grad_norm: 703.7077 loss: 1238.0703 loss_cls: 503.2109 loss_bbox: 352.5556 loss_dfl: 382.3037
02/22 17:54:58 - mmengine - INFO - Epoch(train) [1][ 600/4755] lr: 8.3982e-05 eta: 3 days, 11:54:54 time: 0.5245 data_time: 0.0349 memory: 5862 grad_norm: 705.5707 loss: 1206.8551 loss_cls: 496.0337 loss_bbox: 344.0969 loss_dfl: 366.7244
02/22 17:55:25 - mmengine - INFO - Epoch(train) [1][ 650/4755] lr: 9.0992e-05 eta: 3 days, 10:51:18 time: 0.5325 data_time: 0.0505 memory: 6382 grad_norm: 681.6300 loss: 1171.1414 loss_cls: 483.5630 loss_bbox: 334.9132 loss_dfl: 352.6652
02/22 17:55:52 - mmengine - INFO - Epoch(train) [1][ 700/4755] lr: 9.8002e-05 eta: 3 days, 9:56:12 time: 0.5316 data_time: 0.0846 memory: 5902 grad_norm: 662.0694 loss: 1152.3215 loss_cls: 480.0629 loss_bbox: 329.5538 loss_dfl: 342.7047
02/22 17:56:19 - mmengine - INFO - Epoch(train) [1][ 750/4755] lr: 1.0501e-04 eta: 3 days, 9:17:07 time: 0.5481 data_time: 0.1094 memory: 7089 grad_norm: 653.8856 loss: 1124.8583 loss_cls: 474.3178 loss_bbox: 318.0799 loss_dfl: 332.4606
02/22 17:56:48 - mmengine - INFO - Epoch(train) [1][ 800/4755] lr: 1.1202e-04 eta: 3 days, 9:02:03 time: 0.5869 data_time: 0.1455 memory: 10582 grad_norm: 628.4977 loss: 1106.5697 loss_cls: 472.1445 loss_bbox: 312.6728 loss_dfl: 321.7523
02/22 17:57:12 - mmengine - INFO - Epoch(train) [1][ 850/4755] lr: 1.1903e-04 eta: 3 days, 7:54:21 time: 0.4702 data_time: 0.0982 memory: 5796 grad_norm: 599.6026 loss: 1087.8212 loss_cls: 460.1374 loss_bbox: 311.1525 loss_dfl: 316.5314
02/22 17:57:39 - mmengine - INFO - Epoch(train) [1][ 900/4755] lr: 1.2604e-04 eta: 3 days, 7:22:00 time: 0.5336 data_time: 0.0508 memory: 8169 grad_norm: 625.8642 loss: 1066.0954 loss_cls: 453.7859 loss_bbox: 304.1042 loss_dfl: 308.2053
02/22 17:58:11 - mmengine - INFO - Epoch(train) [1][ 950/4755] lr: 1.3305e-04 eta: 3 days, 7:40:01 time: 0.6465 data_time: 0.0887 memory: 6195 grad_norm: 581.4653 loss: 1050.8522 loss_cls: 451.0907 loss_bbox: 298.9823 loss_dfl: 300.7792
02/22 17:58:52 - mmengine - INFO - Epoch(train) [1][1000/4755] lr: 1.4006e-04 eta: 3 days, 9:06:53 time: 0.8254 data_time: 0.2640 memory: 8595 grad_norm: 564.5235 loss: 1040.2058 loss_cls: 446.7620 loss_bbox: 297.4854 loss_dfl: 295.9584
02/22 17:59:27 - mmengine - INFO - Epoch(train) [1][1050/4755] lr: 1.4707e-04 eta: 3 days, 9:36:47 time: 0.6962 data_time: 0.1749 memory: 6235 grad_norm: 564.7915 loss: 1022.5335 loss_cls: 439.8820 loss_bbox: 291.4202 loss_dfl: 291.2313
02/22 17:59:48 - mmengine - INFO - Epoch(train) [1][1100/4755] lr: 1.5408e-04 eta: 3 days, 8:26:35 time: 0.4254 data_time: 0.0137 memory: 7769 grad_norm: 530.7317 loss: 1004.7401 loss_cls: 435.0032 loss_bbox: 286.9149 loss_dfl: 282.8220
02/22 18:00:12 - mmengine - INFO - Epoch(train) [1][1150/4755] lr: 1.6109e-04 eta: 3 days, 7:37:05 time: 0.4679 data_time: 0.0091 memory: 8102 grad_norm: inf loss: 995.4325 loss_cls: 431.1275 loss_bbox: 285.3974 loss_dfl: 278.9076
02/22 18:00:33 - mmengine - INFO - Epoch(train) [1][1200/4755] lr: 1.6810e-04 eta: 3 days, 6:40:58 time: 0.4354 data_time: 0.0467 memory: 6942 grad_norm: 520.9851 loss: 980.0758 loss_cls: 423.9543 loss_bbox: 280.7031 loss_dfl: 275.4183
02/22 18:00:56 - mmengine - INFO - Epoch(train) [1][1250/4755] lr: 1.7511e-04 eta: 3 days, 5:54:36 time: 0.4521 data_time: 0.0422 memory: 7249 grad_norm: 544.1749 loss: 964.0905 loss_cls: 419.3851 loss_bbox: 275.0380 loss_dfl: 269.6675
02/22 18:01:21 - mmengine - INFO - Epoch(train) [1][1300/4755] lr: 1.8212e-04 eta: 3 days, 5:25:41 time: 0.4979 data_time: 0.0533 memory: 5716 grad_norm: 511.4465 loss: 968.7069 loss_cls: 424.2675 loss_bbox: 276.3973 loss_dfl: 268.0421
02/22 18:01:40 - mmengine - INFO - Epoch(train) [1][1350/4755] lr: 1.8913e-04 eta: 3 days, 4:25:29 time: 0.3838 data_time: 0.0173 memory: 6102 grad_norm: 527.1621 loss: 943.2824 loss_cls: 409.6581 loss_bbox: 270.7041 loss_dfl: 262.9202
02/22 18:02:03 - mmengine - INFO - Epoch(train) [1][1400/4755] lr: 1.9614e-04 eta: 3 days, 3:48:41 time: 0.4516 data_time: 0.0082 memory: 5956 grad_norm: 499.9155 loss: 944.2081 loss_cls: 409.8294 loss_bbox: 271.4986 loss_dfl: 262.8802
02/22 18:02:24 - mmengine - INFO - Epoch(train) [1][1450/4755] lr: 2.0315e-04 eta: 3 days, 3:08:27 time: 0.4298 data_time: 0.0283 memory: 7302 grad_norm: 519.8153 loss: 930.8750 loss_cls: 408.7763 loss_bbox: 264.6100 loss_dfl: 257.4886
02/22 18:02:47 - mmengine - INFO - Epoch(train) [1][1500/4755] lr: 2.1016e-04 eta: 3 days, 2:35:17 time: 0.4465 data_time: 0.0129 memory: 5969 grad_norm: 484.2098 loss: 919.9606 loss_cls: 403.3247 loss_bbox: 262.1625 loss_dfl: 254.4734
02/22 18:03:12 - mmengine - INFO - Epoch(train) [1][1550/4755] lr: 2.1717e-04 eta: 3 days, 2:17:49 time: 0.4998 data_time: 0.0118 memory: 6755 grad_norm: 493.1965 loss: 911.3023 loss_cls: 399.2407 loss_bbox: 261.1044 loss_dfl: 250.9571
02/22 18:03:31 - mmengine - INFO - Epoch(train) [1][1600/4755] lr: 2.2419e-04 eta: 3 days, 1:32:31 time: 0.3827 data_time: 0.0064 memory: 6022 grad_norm: 492.0642 loss: 899.3547 loss_cls: 392.3960 loss_bbox: 259.0811 loss_dfl: 247.8776
02/22 18:03:53 - mmengine - INFO - Epoch(train) [1][1650/4755] lr: 2.3120e-04 eta: 3 days, 1:05:51 time: 0.4492 data_time: 0.0184 memory: 10795 grad_norm: 492.8047 loss: 896.6532 loss_cls: 392.5849 loss_bbox: 258.3923 loss_dfl: 245.6760
02/22 18:04:14 - mmengine - INFO - Epoch(train) [1][1700/4755] lr: 2.3821e-04 eta: 3 days, 0:34:38 time: 0.4229 data_time: 0.0707 memory: 7422 grad_norm: 452.7797 loss: 888.4180 loss_cls: 387.3658 loss_bbox: 257.7762 loss_dfl: 243.2760
02/22 18:04:37 - mmengine - INFO - Epoch(train) [1][1750/4755] lr: 2.4522e-04 eta: 3 days, 0:13:11 time: 0.4584 data_time: 0.0078 memory: 6289 grad_norm: 471.3941 loss: 875.8123 loss_cls: 383.2677 loss_bbox: 253.1041 loss_dfl: 239.4404
02/22 18:05:02 - mmengine - INFO - Epoch(train) [1][1800/4755] lr: 2.5223e-04 eta: 3 days, 0:02:03 time: 0.5002 data_time: 0.0099 memory: 6849 grad_norm: 466.7350 loss: 871.8134 loss_cls: 381.8615 loss_bbox: 250.5506 loss_dfl: 239.4013
02/22 18:05:21 - mmengine - INFO - Epoch(train) [1][1850/4755] lr: 2.5924e-04 eta: 2 days, 23:25:26 time: 0.3780 data_time: 0.0386 memory: 5662 grad_norm: 449.1701 loss: 857.2986 loss_cls: 373.9115 loss_bbox: 247.0927 loss_dfl: 236.2944
02/22 18:05:42 - mmengine - INFO - Epoch(train) [1][1900/4755] lr: 2.6625e-04 eta: 2 days, 22:58:41 time: 0.4162 data_time: 0.0354 memory: 5942 grad_norm: 452.8211 loss: 854.7589 loss_cls: 372.4334 loss_bbox: 246.2098 loss_dfl: 236.1157
@wondervictor hi,感谢你的反馈,loss变化和我的log也是一致的,需要看看到val的时候结果是否正确。 (如果你那边的复现效果是正常的,可以的话share一下config & log文件,我可以对比差异) 我回溯数据集的问题,发现我使用objectsv1(2019)的标注文件,跟mmdet默认版本使用的有一些category命名的差异(human vs person、大小写问题),理论上几个的category的缺失不会引起结果的完全不对,我刚刚更正了json重新run一下。
@lluo-Desktop @taofuyu 我建议你去我提供的OpenDataLab下载Objects365,之前我处理Objects365也折腾过,目前这个版本是和我们code对齐的。另外我现在已经在这个config下训练了~30epoch,目前看下来结果是正常的,20240222_174524.log 供参考。
@wondervictor 非常感谢你的回复,我重新下载了objects365v1(2019-08-02)的json文件。 正在重新复现,应该是数据集版本的问题(我从自有的数据市场获取19版objects365v1的json在label名上有11个差异命名)。
感谢您开源这么出色的工作。我用YOLO-World-s在objects365v1复现,但是前期的loss跟您的趋势不太一样啊,请问这是正常的吗?我没有V100的卡,我只在两张4090上训练。BS per card 也是16,我没改动其他参数,除了训练数据集改成只用objects365v1和GPU数量为2,我的objects365v1数据集是按照您链接中下载的。谢谢
03/12 17:57:03 - mmengine - INFO - Epoch(train) [1][ 50/19019] base_lr: 2.0000e-03 lr: 1.7176e-06 eta: 7 days, 21:30:37 time: 0.3587 data_time: 0.0372 memory: 9180 grad_norm: nan loss: 446.1266 loss_cls: 185.6677 loss_bbox: 123.7541 loss_dfl: 136.7047
03/12 17:57:15 - mmengine - INFO - Epoch(train) [1][ 100/19019] base_lr: 2.0000e-03 lr: 3.4702e-06 eta: 6 days, 14:30:04 time: 0.2413 data_time: 0.0039 memory: 7734 grad_norm: 565.1302 loss: 445.8083 loss_cls: 184.7159 loss_bbox: 124.9006 loss_dfl: 136.1918
03/12 17:57:28 - mmengine - INFO - Epoch(train) [1][ 150/19019] base_lr: 2.0000e-03 lr: 5.2228e-06 eta: 6 days, 5:49:49 time: 0.2508 data_time: 0.0038 memory: 6040 grad_norm: 538.9717 loss: 444.6346 loss_cls: 184.0897 loss_bbox: 124.3050 loss_dfl: 136.2399
03/12 17:57:39 - mmengine - INFO - Epoch(train) [1][ 200/19019] base_lr: 2.0000e-03 lr: 6.9755e-06 eta: 5 days, 23:20:15 time: 0.2345 data_time: 0.0040 memory: 5533 grad_norm: inf loss: 441.9562 loss_cls: 182.6187 loss_bbox: 123.2948 loss_dfl: 136.0427
03/12 17:57:51 - mmengine - INFO - Epoch(train) [1][ 250/19019] base_lr: 2.0000e-03 lr: 8.7281e-06 eta: 5 days, 19:42:03 time: 0.2370 data_time: 0.0038 memory: 7546 grad_norm: 506.1350 loss: 438.6803 loss_cls: 180.0693 loss_bbox: 123.0374 loss_dfl: 135.5736
03/12 17:58:04 - mmengine - INFO - Epoch(train) [1][ 300/19019] base_lr: 2.0000e-03 lr: 1.0481e-05 eta: 5 days, 18:12:13 time: 0.2475 data_time: 0.0039 memory: 5693 grad_norm: 466.7960 loss: 435.8879 loss_cls: 177.5164 loss_bbox: 123.7570 loss_dfl: 134.6145
03/12 17:58:15 - mmengine - INFO - Epoch(train) [1][ 350/19019] base_lr: 2.0000e-03 lr: 1.2233e-05 eta: 5 days, 16:09:50 time: 0.2347 data_time: 0.0039 memory: 6253 grad_norm: 450.4666 loss: 429.7469 loss_cls: 173.1966 loss_bbox: 123.0853 loss_dfl: 133.4649
03/12 17:58:27 - mmengine - INFO - Epoch(train) [1][ 400/19019] base_lr: 2.0000e-03 lr: 1.3986e-05 eta: 5 days, 14:47:06 time: 0.2370 data_time: 0.0037 memory: 6053 grad_norm: 460.2511 loss: 422.2496 loss_cls: 168.1797 loss_bbox: 121.7420 loss_dfl: 132.3279
03/12 17:58:39 - mmengine - INFO - Epoch(train) [1][ 450/19019] base_lr: 2.0000e-03 lr: 1.5739e-05 eta: 5 days, 14:15:45 time: 0.2463 data_time: 0.0040 memory: 6173 grad_norm: 526.7233 loss: 415.7690 loss_cls: 164.0228 loss_bbox: 120.9978 loss_dfl: 130.7484
03/12 17:58:51 - mmengine - INFO - Epoch(train) [1][ 500/19019] base_lr: 2.0000e-03 lr: 1.7491e-05 eta: 5 days, 13:11:15 time: 0.2339 data_time: 0.0037 memory: 8267 grad_norm: 540.6158 loss: 411.8747 loss_cls: 161.5670 loss_bbox: 120.9805 loss_dfl: 129.3272
03/12 17:59:03 - mmengine - INFO - Epoch(train) [1][ 550/19019] base_lr: 2.0000e-03 lr: 1.9244e-05 eta: 5 days, 12:17:28 time: 0.2336 data_time: 0.0040 memory: 5800 grad_norm: 532.7258 loss: 404.0686 loss_cls: 158.0140 loss_bbox: 118.3286 loss_dfl: 127.7259
03/12 17:59:15 - mmengine - INFO - Epoch(train) [1][ 600/19019] base_lr: 2.0000e-03 lr: 2.0997e-05 eta: 5 days, 11:37:55 time: 0.2356 data_time: 0.0039 memory: 6546 grad_norm: 513.6272 loss: 400.2819 loss_cls: 156.8279 loss_bbox: 117.0981 loss_dfl: 126.3559
03/12 17:59:27 - mmengine - INFO - Epoch(train) [1][ 650/19019] base_lr: 2.0000e-03 lr: 2.2749e-05 eta: 5 days, 11:43:15 time: 0.2515 data_time: 0.0039 memory: 9239 grad_norm: 511.9093 loss: 394.8929 loss_cls: 152.5437 loss_bbox: 117.8386 loss_dfl: 124.5105
03/12 17:59:39 - mmengine - INFO - Epoch(train) [1][ 700/19019] base_lr: 2.0000e-03 lr: 2.4502e-05 eta: 5 days, 11:16:46 time: 0.2378 data_time: 0.0040 memory: 5706 grad_norm: 513.3235 loss: 388.5590 loss_cls: 151.2647 loss_bbox: 115.0196 loss_dfl: 122.2748
03/12 17:59:51 - mmengine - INFO - Epoch(train) [1][ 750/19019] base_lr: 2.0000e-03 lr: 2.6254e-05 eta: 5 days, 10:48:51 time: 0.2355 data_time: 0.0041 memory: 7240 grad_norm: 514.5650 loss: 383.7197 loss_cls: 150.8671 loss_bbox: 112.1987 loss_dfl: 120.6540
03/12 18:00:03 - mmengine - INFO - Epoch(train) [1][ 800/19019] base_lr: 2.0000e-03 lr: 2.8007e-05 eta: 5 days, 10:51:38 time: 0.2492 data_time: 0.0040 memory: 6333 grad_norm: 508.5694 loss: 376.1887 loss_cls: 148.2612 loss_bbox: 109.8921 loss_dfl: 118.0354
03/12 18:00:15 - mmengine - INFO - Epoch(train) [1][ 850/19019] base_lr: 2.0000e-03 lr: 2.9760e-05 eta: 5 days, 10:38:19 time: 0.2408 data_time: 0.0041 memory: 5920 grad_norm: 480.6808 loss: 369.1250 loss_cls: 145.7235 loss_bbox: 107.7232 loss_dfl: 115.6783
03/12 18:00:27 - mmengine - INFO - Epoch(train) [1][ 900/19019] base_lr: 2.0000e-03 lr: 3.1512e-05 eta: 5 days, 10:20:42 time: 0.2375 data_time: 0.0038 memory: 5946 grad_norm: 458.0789 loss: 358.9182 loss_cls: 141.4935 loss_bbox: 105.1733 loss_dfl: 112.2514
03/12 18:00:40 - mmengine - INFO - Epoch(train) [1][ 950/19019] base_lr: 2.0000e-03 lr: 3.3265e-05 eta: 5 days, 10:23:21 time: 0.2485 data_time: 0.0039 memory: 5800 grad_norm: 462.1731 loss: 353.5276 loss_cls: 141.4663 loss_bbox: 102.6783 loss_dfl: 109.3830
03/12 18:00:51 - mmengine - INFO - Exp name: yolo_world_v2_s_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_train_lvis_minival_20240312_175448
03/12 18:00:51 - mmengine - INFO - Epoch(train) [1][ 1000/19019] base_lr: 2.0000e-03 lr: 3.5018e-05 eta: 5 days, 10:03:59 time: 0.2348 data_time: 0.0038 memory: 6706 grad_norm: 437.9579 loss: 348.6449 loss_cls: 140.3549 loss_bbox: 101.0544 loss_dfl: 107.2357
03/12 18:01:03 - mmengine - INFO - Epoch(train) [1][ 1050/19019] base_lr: 2.0000e-03 lr: 3.6770e-05 eta: 5 days, 9:47:01 time: 0.2352 data_time: 0.0038 memory: 6786 grad_norm: 405.5232 loss: 341.1842 loss_cls: 137.7237 loss_bbox: 99.0177 loss_dfl: 104.4427
03/12 18:01:15 - mmengine - INFO - Epoch(train) [1][ 1100/19019] base_lr: 2.0000e-03 lr: 3.8523e-05 eta: 5 days, 9:27:24 time: 0.2323 data_time: 0.0038 memory: 5586 grad_norm: 364.3518 loss: 335.0305 loss_cls: 136.1223 loss_bbox: 96.9399 loss_dfl: 101.9682
03/12 18:01:27 - mmengine - INFO - Epoch(train) [1][ 1150/19019] base_lr: 2.0000e-03 lr: 4.0276e-05 eta: 5 days, 9:29:59 time: 0.2472 data_time: 0.0039 memory: 6013 grad_norm: 358.9744 loss: 330.3108 loss_cls: 135.4357 loss_bbox: 95.4105 loss_dfl: 99.4646
03/12 18:01:39 - mmengine - INFO - Epoch(train) [1][ 1200/19019] base_lr: 2.0000e-03 lr: 4.2028e-05 eta: 5 days, 9:26:46 time: 0.2430 data_time: 0.0038 memory: 11920 grad_norm: 344.4472 loss: 327.1931 loss_cls: 134.6496 loss_bbox: 94.8774 loss_dfl: 97.6660
03/12 18:01:51 - mmengine - INFO - Epoch(train) [1][ 1250/19019] base_lr: 2.0000e-03 lr: 4.3781e-05 eta: 5 days, 9:11:37 time: 0.2334 data_time: 0.0037 memory: 5560 grad_norm: 337.1382 loss: 321.7565 loss_cls: 132.5439 loss_bbox: 92.8661 loss_dfl: 96.3464
03/12 18:02:03 - mmengine - INFO - Epoch(train) [1][ 1300/19019] base_lr: 2.0000e-03 lr: 4.5533e-05 eta: 5 days, 9:14:24 time: 0.2472 data_time: 0.0037 memory: 5973 grad_norm: 323.9022 loss: 318.5945 loss_cls: 131.9618 loss_bbox: 91.9790 loss_dfl: 94.6536
03/12 18:02:15 - mmengine - INFO - Epoch(train) [1][ 1350/19019] base_lr: 2.0000e-03 lr: 4.7286e-05 eta: 5 days, 9:01:40 time: 0.2341 data_time: 0.0039 memory: 6306 grad_norm: 319.6556 loss: 314.5906 loss_cls: 131.8039 loss_bbox: 89.8786 loss_dfl: 92.9081
03/12 18:02:27 - mmengine - INFO - Epoch(train) [1][ 1400/19019] base_lr: 2.0000e-03 lr: 4.9039e-05 eta: 5 days, 8:47:36 time: 0.2322 data_time: 0.0037 memory: 6466 grad_norm: 306.2527 loss: 311.0356 loss_cls: 129.8536 loss_bbox: 89.9395 loss_dfl: 91.2425
03/12 18:02:39 - mmengine - INFO - Epoch(train) [1][ 1450/19019] base_lr: 2.0000e-03 lr: 5.0791e-05 eta: 5 days, 8:54:51 time: 0.2508 data_time: 0.0039 memory: 6013 grad_norm: 296.1577 loss: 307.4349 loss_cls: 129.2732 loss_bbox: 88.2924 loss_dfl: 89.8693
03/12 18:02:51 - mmengine - INFO - Epoch(train) [1][ 1500/19019] base_lr: 2.0000e-03 lr: 5.2544e-05 eta: 5 days, 8:44:32 time: 0.2346 data_time: 0.0041 memory: 5746 grad_norm: 294.3796 loss: 302.3804 loss_cls: 126.6819 loss_bbox: 87.4314 loss_dfl: 88.2670
03/12 18:03:03 - mmengine - INFO - Epoch(train) [1][ 1550/19019] base_lr: 2.0000e-03 lr: 5.4297e-05 eta: 5 days, 8:36:30 time: 0.2362 data_time: 0.0038 memory: 5826 grad_norm: 284.2909 loss: 300.7494 loss_cls: 126.4133 loss_bbox: 87.3834 loss_dfl: 86.9527
03/12 18:03:15 - mmengine - INFO - Epoch(train) [1][ 1600/19019] base_lr: 2.0000e-03 lr: 5.6049e-05 eta: 5 days, 8:41:01 time: 0.2484 data_time: 0.0039 memory: 7093 grad_norm: 286.5743 loss: 296.5825 loss_cls: 125.3398 loss_bbox: 85.3759 loss_dfl: 85.8668
03/12 18:03:27 - mmengine - INFO - Epoch(train) [1][ 1650/19019] base_lr: 2.0000e-03 lr: 5.7802e-05 eta: 5 days, 8:35:23 time: 0.2381 data_time: 0.0040 memory: 5573 grad_norm: 272.7181 loss: 293.8952 loss_cls: 125.2692 loss_bbox: 83.8527 loss_dfl: 84.7733
03/12 18:03:39 - mmengine - INFO - Epoch(train) [1][ 1700/19019] base_lr: 2.0000e-03 lr: 5.9554e-05 eta: 5 days, 8:27:36 time: 0.2355 data_time: 0.0038 memory: 6200 grad_norm: 260.2163 loss: 292.4793 loss_cls: 124.4377 loss_bbox: 84.3622 loss_dfl: 83.6794
03/12 18:03:51 - mmengine - INFO - Epoch(train) [1][ 1750/19019] base_lr: 2.0000e-03 lr: 6.1307e-05 eta: 5 days, 8:21:50 time: 0.2372 data_time: 0.0040 memory: 7253 grad_norm: 272.3926 loss: 289.9523 loss_cls: 123.5043 loss_bbox: 83.4889 loss_dfl: 82.9591
03/12 18:04:03 - mmengine - INFO - Epoch(train) [1][ 1800/19019] base_lr: 2.0000e-03 lr: 6.3060e-05 eta: 5 days, 8:24:34 time: 0.2465 data_time: 0.0037 memory: 6826 grad_norm: 268.1772 loss: 285.6544 loss_cls: 120.5544 loss_bbox: 82.8305 loss_dfl: 82.2695
感谢您开源这么出色的工作。我用YOLO-World-s在objects365v1复现,但是前期的loss跟您的趋势不太一样啊,请问这是正常的吗?我没有V100的卡,我只在两张4090上训练。BS per card 也是16,我没改动其他参数,除了训练数据集改成只用objects365v1和GPU数量为2,我的objects365v1数据集是按照您链接中下载的。谢谢
03/12 17:57:03 - mmengine - INFO - Epoch(train) [1][ 50/19019] base_lr: 2.0000e-03 lr: 1.7176e-06 eta: 7 days, 21:30:37 time: 0.3587 data_time: 0.0372 memory: 9180 grad_norm: nan loss: 446.1266 loss_cls: 185.6677 loss_bbox: 123.7541 loss_dfl: 136.7047 03/12 17:57:15 - mmengine - INFO - Epoch(train) [1][ 100/19019] base_lr: 2.0000e-03 lr: 3.4702e-06 eta: 6 days, 14:30:04 time: 0.2413 data_time: 0.0039 memory: 7734 grad_norm: 565.1302 loss: 445.8083 loss_cls: 184.7159 loss_bbox: 124.9006 loss_dfl: 136.1918 03/12 17:57:28 - mmengine - INFO - Epoch(train) [1][ 150/19019] base_lr: 2.0000e-03 lr: 5.2228e-06 eta: 6 days, 5:49:49 time: 0.2508 data_time: 0.0038 memory: 6040 grad_norm: 538.9717 loss: 444.6346 loss_cls: 184.0897 loss_bbox: 124.3050 loss_dfl: 136.2399 03/12 17:57:39 - mmengine - INFO - Epoch(train) [1][ 200/19019] base_lr: 2.0000e-03 lr: 6.9755e-06 eta: 5 days, 23:20:15 time: 0.2345 data_time: 0.0040 memory: 5533 grad_norm: inf loss: 441.9562 loss_cls: 182.6187 loss_bbox: 123.2948 loss_dfl: 136.0427 03/12 17:57:51 - mmengine - INFO - Epoch(train) [1][ 250/19019] base_lr: 2.0000e-03 lr: 8.7281e-06 eta: 5 days, 19:42:03 time: 0.2370 data_time: 0.0038 memory: 7546 grad_norm: 506.1350 loss: 438.6803 loss_cls: 180.0693 loss_bbox: 123.0374 loss_dfl: 135.5736 03/12 17:58:04 - mmengine - INFO - Epoch(train) [1][ 300/19019] base_lr: 2.0000e-03 lr: 1.0481e-05 eta: 5 days, 18:12:13 time: 0.2475 data_time: 0.0039 memory: 5693 grad_norm: 466.7960 loss: 435.8879 loss_cls: 177.5164 loss_bbox: 123.7570 loss_dfl: 134.6145 03/12 17:58:15 - mmengine - INFO - Epoch(train) [1][ 350/19019] base_lr: 2.0000e-03 lr: 1.2233e-05 eta: 5 days, 16:09:50 time: 0.2347 data_time: 0.0039 memory: 6253 grad_norm: 450.4666 loss: 429.7469 loss_cls: 173.1966 loss_bbox: 123.0853 loss_dfl: 133.4649 03/12 17:58:27 - mmengine - INFO - Epoch(train) [1][ 400/19019] base_lr: 2.0000e-03 lr: 1.3986e-05 eta: 5 days, 14:47:06 time: 0.2370 data_time: 0.0037 memory: 6053 grad_norm: 460.2511 loss: 422.2496 loss_cls: 168.1797 loss_bbox: 121.7420 loss_dfl: 132.3279 03/12 17:58:39 - mmengine - INFO - Epoch(train) [1][ 450/19019] base_lr: 2.0000e-03 lr: 1.5739e-05 eta: 5 days, 14:15:45 time: 0.2463 data_time: 0.0040 memory: 6173 grad_norm: 526.7233 loss: 415.7690 loss_cls: 164.0228 loss_bbox: 120.9978 loss_dfl: 130.7484 03/12 17:58:51 - mmengine - INFO - Epoch(train) [1][ 500/19019] base_lr: 2.0000e-03 lr: 1.7491e-05 eta: 5 days, 13:11:15 time: 0.2339 data_time: 0.0037 memory: 8267 grad_norm: 540.6158 loss: 411.8747 loss_cls: 161.5670 loss_bbox: 120.9805 loss_dfl: 129.3272 03/12 17:59:03 - mmengine - INFO - Epoch(train) [1][ 550/19019] base_lr: 2.0000e-03 lr: 1.9244e-05 eta: 5 days, 12:17:28 time: 0.2336 data_time: 0.0040 memory: 5800 grad_norm: 532.7258 loss: 404.0686 loss_cls: 158.0140 loss_bbox: 118.3286 loss_dfl: 127.7259 03/12 17:59:15 - mmengine - INFO - Epoch(train) [1][ 600/19019] base_lr: 2.0000e-03 lr: 2.0997e-05 eta: 5 days, 11:37:55 time: 0.2356 data_time: 0.0039 memory: 6546 grad_norm: 513.6272 loss: 400.2819 loss_cls: 156.8279 loss_bbox: 117.0981 loss_dfl: 126.3559 03/12 17:59:27 - mmengine - INFO - Epoch(train) [1][ 650/19019] base_lr: 2.0000e-03 lr: 2.2749e-05 eta: 5 days, 11:43:15 time: 0.2515 data_time: 0.0039 memory: 9239 grad_norm: 511.9093 loss: 394.8929 loss_cls: 152.5437 loss_bbox: 117.8386 loss_dfl: 124.5105 03/12 17:59:39 - mmengine - INFO - Epoch(train) [1][ 700/19019] base_lr: 2.0000e-03 lr: 2.4502e-05 eta: 5 days, 11:16:46 time: 0.2378 data_time: 0.0040 memory: 5706 grad_norm: 513.3235 loss: 388.5590 loss_cls: 151.2647 loss_bbox: 115.0196 loss_dfl: 122.2748 03/12 17:59:51 - mmengine - INFO - Epoch(train) [1][ 750/19019] base_lr: 2.0000e-03 lr: 2.6254e-05 eta: 5 days, 10:48:51 time: 0.2355 data_time: 0.0041 memory: 7240 grad_norm: 514.5650 loss: 383.7197 loss_cls: 150.8671 loss_bbox: 112.1987 loss_dfl: 120.6540 03/12 18:00:03 - mmengine - INFO - Epoch(train) [1][ 800/19019] base_lr: 2.0000e-03 lr: 2.8007e-05 eta: 5 days, 10:51:38 time: 0.2492 data_time: 0.0040 memory: 6333 grad_norm: 508.5694 loss: 376.1887 loss_cls: 148.2612 loss_bbox: 109.8921 loss_dfl: 118.0354 03/12 18:00:15 - mmengine - INFO - Epoch(train) [1][ 850/19019] base_lr: 2.0000e-03 lr: 2.9760e-05 eta: 5 days, 10:38:19 time: 0.2408 data_time: 0.0041 memory: 5920 grad_norm: 480.6808 loss: 369.1250 loss_cls: 145.7235 loss_bbox: 107.7232 loss_dfl: 115.6783 03/12 18:00:27 - mmengine - INFO - Epoch(train) [1][ 900/19019] base_lr: 2.0000e-03 lr: 3.1512e-05 eta: 5 days, 10:20:42 time: 0.2375 data_time: 0.0038 memory: 5946 grad_norm: 458.0789 loss: 358.9182 loss_cls: 141.4935 loss_bbox: 105.1733 loss_dfl: 112.2514 03/12 18:00:40 - mmengine - INFO - Epoch(train) [1][ 950/19019] base_lr: 2.0000e-03 lr: 3.3265e-05 eta: 5 days, 10:23:21 time: 0.2485 data_time: 0.0039 memory: 5800 grad_norm: 462.1731 loss: 353.5276 loss_cls: 141.4663 loss_bbox: 102.6783 loss_dfl: 109.3830 03/12 18:00:51 - mmengine - INFO - Exp name: yolo_world_v2_s_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_train_lvis_minival_20240312_175448 03/12 18:00:51 - mmengine - INFO - Epoch(train) [1][ 1000/19019] base_lr: 2.0000e-03 lr: 3.5018e-05 eta: 5 days, 10:03:59 time: 0.2348 data_time: 0.0038 memory: 6706 grad_norm: 437.9579 loss: 348.6449 loss_cls: 140.3549 loss_bbox: 101.0544 loss_dfl: 107.2357 03/12 18:01:03 - mmengine - INFO - Epoch(train) [1][ 1050/19019] base_lr: 2.0000e-03 lr: 3.6770e-05 eta: 5 days, 9:47:01 time: 0.2352 data_time: 0.0038 memory: 6786 grad_norm: 405.5232 loss: 341.1842 loss_cls: 137.7237 loss_bbox: 99.0177 loss_dfl: 104.4427 03/12 18:01:15 - mmengine - INFO - Epoch(train) [1][ 1100/19019] base_lr: 2.0000e-03 lr: 3.8523e-05 eta: 5 days, 9:27:24 time: 0.2323 data_time: 0.0038 memory: 5586 grad_norm: 364.3518 loss: 335.0305 loss_cls: 136.1223 loss_bbox: 96.9399 loss_dfl: 101.9682 03/12 18:01:27 - mmengine - INFO - Epoch(train) [1][ 1150/19019] base_lr: 2.0000e-03 lr: 4.0276e-05 eta: 5 days, 9:29:59 time: 0.2472 data_time: 0.0039 memory: 6013 grad_norm: 358.9744 loss: 330.3108 loss_cls: 135.4357 loss_bbox: 95.4105 loss_dfl: 99.4646 03/12 18:01:39 - mmengine - INFO - Epoch(train) [1][ 1200/19019] base_lr: 2.0000e-03 lr: 4.2028e-05 eta: 5 days, 9:26:46 time: 0.2430 data_time: 0.0038 memory: 11920 grad_norm: 344.4472 loss: 327.1931 loss_cls: 134.6496 loss_bbox: 94.8774 loss_dfl: 97.6660 03/12 18:01:51 - mmengine - INFO - Epoch(train) [1][ 1250/19019] base_lr: 2.0000e-03 lr: 4.3781e-05 eta: 5 days, 9:11:37 time: 0.2334 data_time: 0.0037 memory: 5560 grad_norm: 337.1382 loss: 321.7565 loss_cls: 132.5439 loss_bbox: 92.8661 loss_dfl: 96.3464 03/12 18:02:03 - mmengine - INFO - Epoch(train) [1][ 1300/19019] base_lr: 2.0000e-03 lr: 4.5533e-05 eta: 5 days, 9:14:24 time: 0.2472 data_time: 0.0037 memory: 5973 grad_norm: 323.9022 loss: 318.5945 loss_cls: 131.9618 loss_bbox: 91.9790 loss_dfl: 94.6536 03/12 18:02:15 - mmengine - INFO - Epoch(train) [1][ 1350/19019] base_lr: 2.0000e-03 lr: 4.7286e-05 eta: 5 days, 9:01:40 time: 0.2341 data_time: 0.0039 memory: 6306 grad_norm: 319.6556 loss: 314.5906 loss_cls: 131.8039 loss_bbox: 89.8786 loss_dfl: 92.9081 03/12 18:02:27 - mmengine - INFO - Epoch(train) [1][ 1400/19019] base_lr: 2.0000e-03 lr: 4.9039e-05 eta: 5 days, 8:47:36 time: 0.2322 data_time: 0.0037 memory: 6466 grad_norm: 306.2527 loss: 311.0356 loss_cls: 129.8536 loss_bbox: 89.9395 loss_dfl: 91.2425 03/12 18:02:39 - mmengine - INFO - Epoch(train) [1][ 1450/19019] base_lr: 2.0000e-03 lr: 5.0791e-05 eta: 5 days, 8:54:51 time: 0.2508 data_time: 0.0039 memory: 6013 grad_norm: 296.1577 loss: 307.4349 loss_cls: 129.2732 loss_bbox: 88.2924 loss_dfl: 89.8693 03/12 18:02:51 - mmengine - INFO - Epoch(train) [1][ 1500/19019] base_lr: 2.0000e-03 lr: 5.2544e-05 eta: 5 days, 8:44:32 time: 0.2346 data_time: 0.0041 memory: 5746 grad_norm: 294.3796 loss: 302.3804 loss_cls: 126.6819 loss_bbox: 87.4314 loss_dfl: 88.2670 03/12 18:03:03 - mmengine - INFO - Epoch(train) [1][ 1550/19019] base_lr: 2.0000e-03 lr: 5.4297e-05 eta: 5 days, 8:36:30 time: 0.2362 data_time: 0.0038 memory: 5826 grad_norm: 284.2909 loss: 300.7494 loss_cls: 126.4133 loss_bbox: 87.3834 loss_dfl: 86.9527 03/12 18:03:15 - mmengine - INFO - Epoch(train) [1][ 1600/19019] base_lr: 2.0000e-03 lr: 5.6049e-05 eta: 5 days, 8:41:01 time: 0.2484 data_time: 0.0039 memory: 7093 grad_norm: 286.5743 loss: 296.5825 loss_cls: 125.3398 loss_bbox: 85.3759 loss_dfl: 85.8668 03/12 18:03:27 - mmengine - INFO - Epoch(train) [1][ 1650/19019] base_lr: 2.0000e-03 lr: 5.7802e-05 eta: 5 days, 8:35:23 time: 0.2381 data_time: 0.0040 memory: 5573 grad_norm: 272.7181 loss: 293.8952 loss_cls: 125.2692 loss_bbox: 83.8527 loss_dfl: 84.7733 03/12 18:03:39 - mmengine - INFO - Epoch(train) [1][ 1700/19019] base_lr: 2.0000e-03 lr: 5.9554e-05 eta: 5 days, 8:27:36 time: 0.2355 data_time: 0.0038 memory: 6200 grad_norm: 260.2163 loss: 292.4793 loss_cls: 124.4377 loss_bbox: 84.3622 loss_dfl: 83.6794 03/12 18:03:51 - mmengine - INFO - Epoch(train) [1][ 1750/19019] base_lr: 2.0000e-03 lr: 6.1307e-05 eta: 5 days, 8:21:50 time: 0.2372 data_time: 0.0040 memory: 7253 grad_norm: 272.3926 loss: 289.9523 loss_cls: 123.5043 loss_bbox: 83.4889 loss_dfl: 82.9591 03/12 18:04:03 - mmengine - INFO - Epoch(train) [1][ 1800/19019] base_lr: 2.0000e-03 lr: 6.3060e-05 eta: 5 days, 8:24:34 time: 0.2465 data_time: 0.0037 memory: 6826 grad_norm: 268.1772 loss: 285.6544 loss_cls: 120.5544 loss_bbox: 82.8305 loss_dfl: 82.2695
总的batch size变了,lr也得跟着变呀
感谢您开源这么出色的工作。我用YOLO-World-s在objects365v1复现,但是前期的loss跟您的趋势不太一样啊,请问这是正常的吗?我没有V100的卡,我只在两张4090上训练。BS per card 也是16,我没改动其他参数,除了训练数据集改成只用objects365v1和GPU数量为2,我的objects365v1数据集是按照您链接中下载的。谢谢
03/12 17:57:03 - mmengine - INFO - Epoch(train) [1][ 50/19019] base_lr: 2.0000e-03 lr: 1.7176e-06 eta: 7 days, 21:30:37 time: 0.3587 data_time: 0.0372 memory: 9180 grad_norm: nan loss: 446.1266 loss_cls: 185.6677 loss_bbox: 123.7541 loss_dfl: 136.7047 03/12 17:57:15 - mmengine - INFO - Epoch(train) [1][ 100/19019] base_lr: 2.0000e-03 lr: 3.4702e-06 eta: 6 days, 14:30:04 time: 0.2413 data_time: 0.0039 memory: 7734 grad_norm: 565.1302 loss: 445.8083 loss_cls: 184.7159 loss_bbox: 124.9006 loss_dfl: 136.1918 03/12 17:57:28 - mmengine - INFO - Epoch(train) [1][ 150/19019] base_lr: 2.0000e-03 lr: 5.2228e-06 eta: 6 days, 5:49:49 time: 0.2508 data_time: 0.0038 memory: 6040 grad_norm: 538.9717 loss: 444.6346 loss_cls: 184.0897 loss_bbox: 124.3050 loss_dfl: 136.2399 03/12 17:57:39 - mmengine - INFO - Epoch(train) [1][ 200/19019] base_lr: 2.0000e-03 lr: 6.9755e-06 eta: 5 days, 23:20:15 time: 0.2345 data_time: 0.0040 memory: 5533 grad_norm: inf loss: 441.9562 loss_cls: 182.6187 loss_bbox: 123.2948 loss_dfl: 136.0427 03/12 17:57:51 - mmengine - INFO - Epoch(train) [1][ 250/19019] base_lr: 2.0000e-03 lr: 8.7281e-06 eta: 5 days, 19:42:03 time: 0.2370 data_time: 0.0038 memory: 7546 grad_norm: 506.1350 loss: 438.6803 loss_cls: 180.0693 loss_bbox: 123.0374 loss_dfl: 135.5736 03/12 17:58:04 - mmengine - INFO - Epoch(train) [1][ 300/19019] base_lr: 2.0000e-03 lr: 1.0481e-05 eta: 5 days, 18:12:13 time: 0.2475 data_time: 0.0039 memory: 5693 grad_norm: 466.7960 loss: 435.8879 loss_cls: 177.5164 loss_bbox: 123.7570 loss_dfl: 134.6145 03/12 17:58:15 - mmengine - INFO - Epoch(train) [1][ 350/19019] base_lr: 2.0000e-03 lr: 1.2233e-05 eta: 5 days, 16:09:50 time: 0.2347 data_time: 0.0039 memory: 6253 grad_norm: 450.4666 loss: 429.7469 loss_cls: 173.1966 loss_bbox: 123.0853 loss_dfl: 133.4649 03/12 17:58:27 - mmengine - INFO - Epoch(train) [1][ 400/19019] base_lr: 2.0000e-03 lr: 1.3986e-05 eta: 5 days, 14:47:06 time: 0.2370 data_time: 0.0037 memory: 6053 grad_norm: 460.2511 loss: 422.2496 loss_cls: 168.1797 loss_bbox: 121.7420 loss_dfl: 132.3279 03/12 17:58:39 - mmengine - INFO - Epoch(train) [1][ 450/19019] base_lr: 2.0000e-03 lr: 1.5739e-05 eta: 5 days, 14:15:45 time: 0.2463 data_time: 0.0040 memory: 6173 grad_norm: 526.7233 loss: 415.7690 loss_cls: 164.0228 loss_bbox: 120.9978 loss_dfl: 130.7484 03/12 17:58:51 - mmengine - INFO - Epoch(train) [1][ 500/19019] base_lr: 2.0000e-03 lr: 1.7491e-05 eta: 5 days, 13:11:15 time: 0.2339 data_time: 0.0037 memory: 8267 grad_norm: 540.6158 loss: 411.8747 loss_cls: 161.5670 loss_bbox: 120.9805 loss_dfl: 129.3272 03/12 17:59:03 - mmengine - INFO - Epoch(train) [1][ 550/19019] base_lr: 2.0000e-03 lr: 1.9244e-05 eta: 5 days, 12:17:28 time: 0.2336 data_time: 0.0040 memory: 5800 grad_norm: 532.7258 loss: 404.0686 loss_cls: 158.0140 loss_bbox: 118.3286 loss_dfl: 127.7259 03/12 17:59:15 - mmengine - INFO - Epoch(train) [1][ 600/19019] base_lr: 2.0000e-03 lr: 2.0997e-05 eta: 5 days, 11:37:55 time: 0.2356 data_time: 0.0039 memory: 6546 grad_norm: 513.6272 loss: 400.2819 loss_cls: 156.8279 loss_bbox: 117.0981 loss_dfl: 126.3559 03/12 17:59:27 - mmengine - INFO - Epoch(train) [1][ 650/19019] base_lr: 2.0000e-03 lr: 2.2749e-05 eta: 5 days, 11:43:15 time: 0.2515 data_time: 0.0039 memory: 9239 grad_norm: 511.9093 loss: 394.8929 loss_cls: 152.5437 loss_bbox: 117.8386 loss_dfl: 124.5105 03/12 17:59:39 - mmengine - INFO - Epoch(train) [1][ 700/19019] base_lr: 2.0000e-03 lr: 2.4502e-05 eta: 5 days, 11:16:46 time: 0.2378 data_time: 0.0040 memory: 5706 grad_norm: 513.3235 loss: 388.5590 loss_cls: 151.2647 loss_bbox: 115.0196 loss_dfl: 122.2748 03/12 17:59:51 - mmengine - INFO - Epoch(train) [1][ 750/19019] base_lr: 2.0000e-03 lr: 2.6254e-05 eta: 5 days, 10:48:51 time: 0.2355 data_time: 0.0041 memory: 7240 grad_norm: 514.5650 loss: 383.7197 loss_cls: 150.8671 loss_bbox: 112.1987 loss_dfl: 120.6540 03/12 18:00:03 - mmengine - INFO - Epoch(train) [1][ 800/19019] base_lr: 2.0000e-03 lr: 2.8007e-05 eta: 5 days, 10:51:38 time: 0.2492 data_time: 0.0040 memory: 6333 grad_norm: 508.5694 loss: 376.1887 loss_cls: 148.2612 loss_bbox: 109.8921 loss_dfl: 118.0354 03/12 18:00:15 - mmengine - INFO - Epoch(train) [1][ 850/19019] base_lr: 2.0000e-03 lr: 2.9760e-05 eta: 5 days, 10:38:19 time: 0.2408 data_time: 0.0041 memory: 5920 grad_norm: 480.6808 loss: 369.1250 loss_cls: 145.7235 loss_bbox: 107.7232 loss_dfl: 115.6783 03/12 18:00:27 - mmengine - INFO - Epoch(train) [1][ 900/19019] base_lr: 2.0000e-03 lr: 3.1512e-05 eta: 5 days, 10:20:42 time: 0.2375 data_time: 0.0038 memory: 5946 grad_norm: 458.0789 loss: 358.9182 loss_cls: 141.4935 loss_bbox: 105.1733 loss_dfl: 112.2514 03/12 18:00:40 - mmengine - INFO - Epoch(train) [1][ 950/19019] base_lr: 2.0000e-03 lr: 3.3265e-05 eta: 5 days, 10:23:21 time: 0.2485 data_time: 0.0039 memory: 5800 grad_norm: 462.1731 loss: 353.5276 loss_cls: 141.4663 loss_bbox: 102.6783 loss_dfl: 109.3830 03/12 18:00:51 - mmengine - INFO - Exp name: yolo_world_v2_s_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_train_lvis_minival_20240312_175448 03/12 18:00:51 - mmengine - INFO - Epoch(train) [1][ 1000/19019] base_lr: 2.0000e-03 lr: 3.5018e-05 eta: 5 days, 10:03:59 time: 0.2348 data_time: 0.0038 memory: 6706 grad_norm: 437.9579 loss: 348.6449 loss_cls: 140.3549 loss_bbox: 101.0544 loss_dfl: 107.2357 03/12 18:01:03 - mmengine - INFO - Epoch(train) [1][ 1050/19019] base_lr: 2.0000e-03 lr: 3.6770e-05 eta: 5 days, 9:47:01 time: 0.2352 data_time: 0.0038 memory: 6786 grad_norm: 405.5232 loss: 341.1842 loss_cls: 137.7237 loss_bbox: 99.0177 loss_dfl: 104.4427 03/12 18:01:15 - mmengine - INFO - Epoch(train) [1][ 1100/19019] base_lr: 2.0000e-03 lr: 3.8523e-05 eta: 5 days, 9:27:24 time: 0.2323 data_time: 0.0038 memory: 5586 grad_norm: 364.3518 loss: 335.0305 loss_cls: 136.1223 loss_bbox: 96.9399 loss_dfl: 101.9682 03/12 18:01:27 - mmengine - INFO - Epoch(train) [1][ 1150/19019] base_lr: 2.0000e-03 lr: 4.0276e-05 eta: 5 days, 9:29:59 time: 0.2472 data_time: 0.0039 memory: 6013 grad_norm: 358.9744 loss: 330.3108 loss_cls: 135.4357 loss_bbox: 95.4105 loss_dfl: 99.4646 03/12 18:01:39 - mmengine - INFO - Epoch(train) [1][ 1200/19019] base_lr: 2.0000e-03 lr: 4.2028e-05 eta: 5 days, 9:26:46 time: 0.2430 data_time: 0.0038 memory: 11920 grad_norm: 344.4472 loss: 327.1931 loss_cls: 134.6496 loss_bbox: 94.8774 loss_dfl: 97.6660 03/12 18:01:51 - mmengine - INFO - Epoch(train) [1][ 1250/19019] base_lr: 2.0000e-03 lr: 4.3781e-05 eta: 5 days, 9:11:37 time: 0.2334 data_time: 0.0037 memory: 5560 grad_norm: 337.1382 loss: 321.7565 loss_cls: 132.5439 loss_bbox: 92.8661 loss_dfl: 96.3464 03/12 18:02:03 - mmengine - INFO - Epoch(train) [1][ 1300/19019] base_lr: 2.0000e-03 lr: 4.5533e-05 eta: 5 days, 9:14:24 time: 0.2472 data_time: 0.0037 memory: 5973 grad_norm: 323.9022 loss: 318.5945 loss_cls: 131.9618 loss_bbox: 91.9790 loss_dfl: 94.6536 03/12 18:02:15 - mmengine - INFO - Epoch(train) [1][ 1350/19019] base_lr: 2.0000e-03 lr: 4.7286e-05 eta: 5 days, 9:01:40 time: 0.2341 data_time: 0.0039 memory: 6306 grad_norm: 319.6556 loss: 314.5906 loss_cls: 131.8039 loss_bbox: 89.8786 loss_dfl: 92.9081 03/12 18:02:27 - mmengine - INFO - Epoch(train) [1][ 1400/19019] base_lr: 2.0000e-03 lr: 4.9039e-05 eta: 5 days, 8:47:36 time: 0.2322 data_time: 0.0037 memory: 6466 grad_norm: 306.2527 loss: 311.0356 loss_cls: 129.8536 loss_bbox: 89.9395 loss_dfl: 91.2425 03/12 18:02:39 - mmengine - INFO - Epoch(train) [1][ 1450/19019] base_lr: 2.0000e-03 lr: 5.0791e-05 eta: 5 days, 8:54:51 time: 0.2508 data_time: 0.0039 memory: 6013 grad_norm: 296.1577 loss: 307.4349 loss_cls: 129.2732 loss_bbox: 88.2924 loss_dfl: 89.8693 03/12 18:02:51 - mmengine - INFO - Epoch(train) [1][ 1500/19019] base_lr: 2.0000e-03 lr: 5.2544e-05 eta: 5 days, 8:44:32 time: 0.2346 data_time: 0.0041 memory: 5746 grad_norm: 294.3796 loss: 302.3804 loss_cls: 126.6819 loss_bbox: 87.4314 loss_dfl: 88.2670 03/12 18:03:03 - mmengine - INFO - Epoch(train) [1][ 1550/19019] base_lr: 2.0000e-03 lr: 5.4297e-05 eta: 5 days, 8:36:30 time: 0.2362 data_time: 0.0038 memory: 5826 grad_norm: 284.2909 loss: 300.7494 loss_cls: 126.4133 loss_bbox: 87.3834 loss_dfl: 86.9527 03/12 18:03:15 - mmengine - INFO - Epoch(train) [1][ 1600/19019] base_lr: 2.0000e-03 lr: 5.6049e-05 eta: 5 days, 8:41:01 time: 0.2484 data_time: 0.0039 memory: 7093 grad_norm: 286.5743 loss: 296.5825 loss_cls: 125.3398 loss_bbox: 85.3759 loss_dfl: 85.8668 03/12 18:03:27 - mmengine - INFO - Epoch(train) [1][ 1650/19019] base_lr: 2.0000e-03 lr: 5.7802e-05 eta: 5 days, 8:35:23 time: 0.2381 data_time: 0.0040 memory: 5573 grad_norm: 272.7181 loss: 293.8952 loss_cls: 125.2692 loss_bbox: 83.8527 loss_dfl: 84.7733 03/12 18:03:39 - mmengine - INFO - Epoch(train) [1][ 1700/19019] base_lr: 2.0000e-03 lr: 5.9554e-05 eta: 5 days, 8:27:36 time: 0.2355 data_time: 0.0038 memory: 6200 grad_norm: 260.2163 loss: 292.4793 loss_cls: 124.4377 loss_bbox: 84.3622 loss_dfl: 83.6794 03/12 18:03:51 - mmengine - INFO - Epoch(train) [1][ 1750/19019] base_lr: 2.0000e-03 lr: 6.1307e-05 eta: 5 days, 8:21:50 time: 0.2372 data_time: 0.0040 memory: 7253 grad_norm: 272.3926 loss: 289.9523 loss_cls: 123.5043 loss_bbox: 83.4889 loss_dfl: 82.9591 03/12 18:04:03 - mmengine - INFO - Epoch(train) [1][ 1800/19019] base_lr: 2.0000e-03 lr: 6.3060e-05 eta: 5 days, 8:24:34 time: 0.2465 data_time: 0.0037 memory: 6826 grad_norm: 268.1772 loss: 285.6544 loss_cls: 120.5544 loss_bbox: 82.8305 loss_dfl: 82.2695总的batch size变了,lr也得跟着变呀
谢谢,我明天去试试按比例调整一下学习率,请问还有其他参数需要跟着调整吗?
感谢您开源这么出色的工作。我用YOLO-World-s在objects365v1复现,但是前期的loss跟您的趋势不太一样啊,请问这是正常的吗?我没有V100的卡,我只在两张4090上训练。BS per card 也是16,我没改动其他参数,除了训练数据集改成只用objects365v1和GPU数量为2,我的objects365v1数据集是按照您链接中下载的。谢谢
03/12 17:57:03 - mmengine - INFO - Epoch(train) [1][ 50/19019] base_lr: 2.0000e-03 lr: 1.7176e-06 eta: 7 days, 21:30:37 time: 0.3587 data_time: 0.0372 memory: 9180 grad_norm: nan loss: 446.1266 loss_cls: 185.6677 loss_bbox: 123.7541 loss_dfl: 136.7047 03/12 17:57:15 - mmengine - INFO - Epoch(train) [1][ 100/19019] base_lr: 2.0000e-03 lr: 3.4702e-06 eta: 6 days, 14:30:04 time: 0.2413 data_time: 0.0039 memory: 7734 grad_norm: 565.1302 loss: 445.8083 loss_cls: 184.7159 loss_bbox: 124.9006 loss_dfl: 136.1918 03/12 17:57:28 - mmengine - INFO - Epoch(train) [1][ 150/19019] base_lr: 2.0000e-03 lr: 5.2228e-06 eta: 6 days, 5:49:49 time: 0.2508 data_time: 0.0038 memory: 6040 grad_norm: 538.9717 loss: 444.6346 loss_cls: 184.0897 loss_bbox: 124.3050 loss_dfl: 136.2399 03/12 17:57:39 - mmengine - INFO - Epoch(train) [1][ 200/19019] base_lr: 2.0000e-03 lr: 6.9755e-06 eta: 5 days, 23:20:15 time: 0.2345 data_time: 0.0040 memory: 5533 grad_norm: inf loss: 441.9562 loss_cls: 182.6187 loss_bbox: 123.2948 loss_dfl: 136.0427 03/12 17:57:51 - mmengine - INFO - Epoch(train) [1][ 250/19019] base_lr: 2.0000e-03 lr: 8.7281e-06 eta: 5 days, 19:42:03 time: 0.2370 data_time: 0.0038 memory: 7546 grad_norm: 506.1350 loss: 438.6803 loss_cls: 180.0693 loss_bbox: 123.0374 loss_dfl: 135.5736 03/12 17:58:04 - mmengine - INFO - Epoch(train) [1][ 300/19019] base_lr: 2.0000e-03 lr: 1.0481e-05 eta: 5 days, 18:12:13 time: 0.2475 data_time: 0.0039 memory: 5693 grad_norm: 466.7960 loss: 435.8879 loss_cls: 177.5164 loss_bbox: 123.7570 loss_dfl: 134.6145 03/12 17:58:15 - mmengine - INFO - Epoch(train) [1][ 350/19019] base_lr: 2.0000e-03 lr: 1.2233e-05 eta: 5 days, 16:09:50 time: 0.2347 data_time: 0.0039 memory: 6253 grad_norm: 450.4666 loss: 429.7469 loss_cls: 173.1966 loss_bbox: 123.0853 loss_dfl: 133.4649 03/12 17:58:27 - mmengine - INFO - Epoch(train) [1][ 400/19019] base_lr: 2.0000e-03 lr: 1.3986e-05 eta: 5 days, 14:47:06 time: 0.2370 data_time: 0.0037 memory: 6053 grad_norm: 460.2511 loss: 422.2496 loss_cls: 168.1797 loss_bbox: 121.7420 loss_dfl: 132.3279 03/12 17:58:39 - mmengine - INFO - Epoch(train) [1][ 450/19019] base_lr: 2.0000e-03 lr: 1.5739e-05 eta: 5 days, 14:15:45 time: 0.2463 data_time: 0.0040 memory: 6173 grad_norm: 526.7233 loss: 415.7690 loss_cls: 164.0228 loss_bbox: 120.9978 loss_dfl: 130.7484 03/12 17:58:51 - mmengine - INFO - Epoch(train) [1][ 500/19019] base_lr: 2.0000e-03 lr: 1.7491e-05 eta: 5 days, 13:11:15 time: 0.2339 data_time: 0.0037 memory: 8267 grad_norm: 540.6158 loss: 411.8747 loss_cls: 161.5670 loss_bbox: 120.9805 loss_dfl: 129.3272 03/12 17:59:03 - mmengine - INFO - Epoch(train) [1][ 550/19019] base_lr: 2.0000e-03 lr: 1.9244e-05 eta: 5 days, 12:17:28 time: 0.2336 data_time: 0.0040 memory: 5800 grad_norm: 532.7258 loss: 404.0686 loss_cls: 158.0140 loss_bbox: 118.3286 loss_dfl: 127.7259 03/12 17:59:15 - mmengine - INFO - Epoch(train) [1][ 600/19019] base_lr: 2.0000e-03 lr: 2.0997e-05 eta: 5 days, 11:37:55 time: 0.2356 data_time: 0.0039 memory: 6546 grad_norm: 513.6272 loss: 400.2819 loss_cls: 156.8279 loss_bbox: 117.0981 loss_dfl: 126.3559 03/12 17:59:27 - mmengine - INFO - Epoch(train) [1][ 650/19019] base_lr: 2.0000e-03 lr: 2.2749e-05 eta: 5 days, 11:43:15 time: 0.2515 data_time: 0.0039 memory: 9239 grad_norm: 511.9093 loss: 394.8929 loss_cls: 152.5437 loss_bbox: 117.8386 loss_dfl: 124.5105 03/12 17:59:39 - mmengine - INFO - Epoch(train) [1][ 700/19019] base_lr: 2.0000e-03 lr: 2.4502e-05 eta: 5 days, 11:16:46 time: 0.2378 data_time: 0.0040 memory: 5706 grad_norm: 513.3235 loss: 388.5590 loss_cls: 151.2647 loss_bbox: 115.0196 loss_dfl: 122.2748 03/12 17:59:51 - mmengine - INFO - Epoch(train) [1][ 750/19019] base_lr: 2.0000e-03 lr: 2.6254e-05 eta: 5 days, 10:48:51 time: 0.2355 data_time: 0.0041 memory: 7240 grad_norm: 514.5650 loss: 383.7197 loss_cls: 150.8671 loss_bbox: 112.1987 loss_dfl: 120.6540 03/12 18:00:03 - mmengine - INFO - Epoch(train) [1][ 800/19019] base_lr: 2.0000e-03 lr: 2.8007e-05 eta: 5 days, 10:51:38 time: 0.2492 data_time: 0.0040 memory: 6333 grad_norm: 508.5694 loss: 376.1887 loss_cls: 148.2612 loss_bbox: 109.8921 loss_dfl: 118.0354 03/12 18:00:15 - mmengine - INFO - Epoch(train) [1][ 850/19019] base_lr: 2.0000e-03 lr: 2.9760e-05 eta: 5 days, 10:38:19 time: 0.2408 data_time: 0.0041 memory: 5920 grad_norm: 480.6808 loss: 369.1250 loss_cls: 145.7235 loss_bbox: 107.7232 loss_dfl: 115.6783 03/12 18:00:27 - mmengine - INFO - Epoch(train) [1][ 900/19019] base_lr: 2.0000e-03 lr: 3.1512e-05 eta: 5 days, 10:20:42 time: 0.2375 data_time: 0.0038 memory: 5946 grad_norm: 458.0789 loss: 358.9182 loss_cls: 141.4935 loss_bbox: 105.1733 loss_dfl: 112.2514 03/12 18:00:40 - mmengine - INFO - Epoch(train) [1][ 950/19019] base_lr: 2.0000e-03 lr: 3.3265e-05 eta: 5 days, 10:23:21 time: 0.2485 data_time: 0.0039 memory: 5800 grad_norm: 462.1731 loss: 353.5276 loss_cls: 141.4663 loss_bbox: 102.6783 loss_dfl: 109.3830 03/12 18:00:51 - mmengine - INFO - Exp name: yolo_world_v2_s_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_train_lvis_minival_20240312_175448 03/12 18:00:51 - mmengine - INFO - Epoch(train) [1][ 1000/19019] base_lr: 2.0000e-03 lr: 3.5018e-05 eta: 5 days, 10:03:59 time: 0.2348 data_time: 0.0038 memory: 6706 grad_norm: 437.9579 loss: 348.6449 loss_cls: 140.3549 loss_bbox: 101.0544 loss_dfl: 107.2357 03/12 18:01:03 - mmengine - INFO - Epoch(train) [1][ 1050/19019] base_lr: 2.0000e-03 lr: 3.6770e-05 eta: 5 days, 9:47:01 time: 0.2352 data_time: 0.0038 memory: 6786 grad_norm: 405.5232 loss: 341.1842 loss_cls: 137.7237 loss_bbox: 99.0177 loss_dfl: 104.4427 03/12 18:01:15 - mmengine - INFO - Epoch(train) [1][ 1100/19019] base_lr: 2.0000e-03 lr: 3.8523e-05 eta: 5 days, 9:27:24 time: 0.2323 data_time: 0.0038 memory: 5586 grad_norm: 364.3518 loss: 335.0305 loss_cls: 136.1223 loss_bbox: 96.9399 loss_dfl: 101.9682 03/12 18:01:27 - mmengine - INFO - Epoch(train) [1][ 1150/19019] base_lr: 2.0000e-03 lr: 4.0276e-05 eta: 5 days, 9:29:59 time: 0.2472 data_time: 0.0039 memory: 6013 grad_norm: 358.9744 loss: 330.3108 loss_cls: 135.4357 loss_bbox: 95.4105 loss_dfl: 99.4646 03/12 18:01:39 - mmengine - INFO - Epoch(train) [1][ 1200/19019] base_lr: 2.0000e-03 lr: 4.2028e-05 eta: 5 days, 9:26:46 time: 0.2430 data_time: 0.0038 memory: 11920 grad_norm: 344.4472 loss: 327.1931 loss_cls: 134.6496 loss_bbox: 94.8774 loss_dfl: 97.6660 03/12 18:01:51 - mmengine - INFO - Epoch(train) [1][ 1250/19019] base_lr: 2.0000e-03 lr: 4.3781e-05 eta: 5 days, 9:11:37 time: 0.2334 data_time: 0.0037 memory: 5560 grad_norm: 337.1382 loss: 321.7565 loss_cls: 132.5439 loss_bbox: 92.8661 loss_dfl: 96.3464 03/12 18:02:03 - mmengine - INFO - Epoch(train) [1][ 1300/19019] base_lr: 2.0000e-03 lr: 4.5533e-05 eta: 5 days, 9:14:24 time: 0.2472 data_time: 0.0037 memory: 5973 grad_norm: 323.9022 loss: 318.5945 loss_cls: 131.9618 loss_bbox: 91.9790 loss_dfl: 94.6536 03/12 18:02:15 - mmengine - INFO - Epoch(train) [1][ 1350/19019] base_lr: 2.0000e-03 lr: 4.7286e-05 eta: 5 days, 9:01:40 time: 0.2341 data_time: 0.0039 memory: 6306 grad_norm: 319.6556 loss: 314.5906 loss_cls: 131.8039 loss_bbox: 89.8786 loss_dfl: 92.9081 03/12 18:02:27 - mmengine - INFO - Epoch(train) [1][ 1400/19019] base_lr: 2.0000e-03 lr: 4.9039e-05 eta: 5 days, 8:47:36 time: 0.2322 data_time: 0.0037 memory: 6466 grad_norm: 306.2527 loss: 311.0356 loss_cls: 129.8536 loss_bbox: 89.9395 loss_dfl: 91.2425 03/12 18:02:39 - mmengine - INFO - Epoch(train) [1][ 1450/19019] base_lr: 2.0000e-03 lr: 5.0791e-05 eta: 5 days, 8:54:51 time: 0.2508 data_time: 0.0039 memory: 6013 grad_norm: 296.1577 loss: 307.4349 loss_cls: 129.2732 loss_bbox: 88.2924 loss_dfl: 89.8693 03/12 18:02:51 - mmengine - INFO - Epoch(train) [1][ 1500/19019] base_lr: 2.0000e-03 lr: 5.2544e-05 eta: 5 days, 8:44:32 time: 0.2346 data_time: 0.0041 memory: 5746 grad_norm: 294.3796 loss: 302.3804 loss_cls: 126.6819 loss_bbox: 87.4314 loss_dfl: 88.2670 03/12 18:03:03 - mmengine - INFO - Epoch(train) [1][ 1550/19019] base_lr: 2.0000e-03 lr: 5.4297e-05 eta: 5 days, 8:36:30 time: 0.2362 data_time: 0.0038 memory: 5826 grad_norm: 284.2909 loss: 300.7494 loss_cls: 126.4133 loss_bbox: 87.3834 loss_dfl: 86.9527 03/12 18:03:15 - mmengine - INFO - Epoch(train) [1][ 1600/19019] base_lr: 2.0000e-03 lr: 5.6049e-05 eta: 5 days, 8:41:01 time: 0.2484 data_time: 0.0039 memory: 7093 grad_norm: 286.5743 loss: 296.5825 loss_cls: 125.3398 loss_bbox: 85.3759 loss_dfl: 85.8668 03/12 18:03:27 - mmengine - INFO - Epoch(train) [1][ 1650/19019] base_lr: 2.0000e-03 lr: 5.7802e-05 eta: 5 days, 8:35:23 time: 0.2381 data_time: 0.0040 memory: 5573 grad_norm: 272.7181 loss: 293.8952 loss_cls: 125.2692 loss_bbox: 83.8527 loss_dfl: 84.7733 03/12 18:03:39 - mmengine - INFO - Epoch(train) [1][ 1700/19019] base_lr: 2.0000e-03 lr: 5.9554e-05 eta: 5 days, 8:27:36 time: 0.2355 data_time: 0.0038 memory: 6200 grad_norm: 260.2163 loss: 292.4793 loss_cls: 124.4377 loss_bbox: 84.3622 loss_dfl: 83.6794 03/12 18:03:51 - mmengine - INFO - Epoch(train) [1][ 1750/19019] base_lr: 2.0000e-03 lr: 6.1307e-05 eta: 5 days, 8:21:50 time: 0.2372 data_time: 0.0040 memory: 7253 grad_norm: 272.3926 loss: 289.9523 loss_cls: 123.5043 loss_bbox: 83.4889 loss_dfl: 82.9591 03/12 18:04:03 - mmengine - INFO - Epoch(train) [1][ 1800/19019] base_lr: 2.0000e-03 lr: 6.3060e-05 eta: 5 days, 8:24:34 time: 0.2465 data_time: 0.0037 memory: 6826 grad_norm: 268.1772 loss: 285.6544 loss_cls: 120.5544 loss_bbox: 82.8305 loss_dfl: 82.2695总的batch size变了,lr也得跟着变呀
谢谢,我明天去试试按比例调整一下学习率,请问还有其他参数需要跟着调整吗?
说实话,你只用O365,不用另外两个数据集;batch size也只有32,与原文512差太多。感觉很难复现得出来。
感谢您开源这么出色的工作。我用YOLO-World-s在objects365v1复现,但是前期的loss跟您的趋势不太一样啊,请问这是正常的吗?我没有V100的卡,我只在两张4090上训练。BS per card 也是16,我没改动其他参数,除了训练数据集改成只用objects365v1和GPU数量为2,我的objects365v1数据集是按照您链接中下载的。谢谢
03/12 17:57:03 - mmengine - INFO - Epoch(train) [1][ 50/19019] base_lr: 2.0000e-03 lr: 1.7176e-06 eta: 7 days, 21:30:37 time: 0.3587 data_time: 0.0372 memory: 9180 grad_norm: nan loss: 446.1266 loss_cls: 185.6677 loss_bbox: 123.7541 loss_dfl: 136.7047 03/12 17:57:15 - mmengine - INFO - Epoch(train) [1][ 100/19019] base_lr: 2.0000e-03 lr: 3.4702e-06 eta: 6 days, 14:30:04 time: 0.2413 data_time: 0.0039 memory: 7734 grad_norm: 565.1302 loss: 445.8083 loss_cls: 184.7159 loss_bbox: 124.9006 loss_dfl: 136.1918 03/12 17:57:28 - mmengine - INFO - Epoch(train) [1][ 150/19019] base_lr: 2.0000e-03 lr: 5.2228e-06 eta: 6 days, 5:49:49 time: 0.2508 data_time: 0.0038 memory: 6040 grad_norm: 538.9717 loss: 444.6346 loss_cls: 184.0897 loss_bbox: 124.3050 loss_dfl: 136.2399 03/12 17:57:39 - mmengine - INFO - Epoch(train) [1][ 200/19019] base_lr: 2.0000e-03 lr: 6.9755e-06 eta: 5 days, 23:20:15 time: 0.2345 data_time: 0.0040 memory: 5533 grad_norm: inf loss: 441.9562 loss_cls: 182.6187 loss_bbox: 123.2948 loss_dfl: 136.0427 03/12 17:57:51 - mmengine - INFO - Epoch(train) [1][ 250/19019] base_lr: 2.0000e-03 lr: 8.7281e-06 eta: 5 days, 19:42:03 time: 0.2370 data_time: 0.0038 memory: 7546 grad_norm: 506.1350 loss: 438.6803 loss_cls: 180.0693 loss_bbox: 123.0374 loss_dfl: 135.5736 03/12 17:58:04 - mmengine - INFO - Epoch(train) [1][ 300/19019] base_lr: 2.0000e-03 lr: 1.0481e-05 eta: 5 days, 18:12:13 time: 0.2475 data_time: 0.0039 memory: 5693 grad_norm: 466.7960 loss: 435.8879 loss_cls: 177.5164 loss_bbox: 123.7570 loss_dfl: 134.6145 03/12 17:58:15 - mmengine - INFO - Epoch(train) [1][ 350/19019] base_lr: 2.0000e-03 lr: 1.2233e-05 eta: 5 days, 16:09:50 time: 0.2347 data_time: 0.0039 memory: 6253 grad_norm: 450.4666 loss: 429.7469 loss_cls: 173.1966 loss_bbox: 123.0853 loss_dfl: 133.4649 03/12 17:58:27 - mmengine - INFO - Epoch(train) [1][ 400/19019] base_lr: 2.0000e-03 lr: 1.3986e-05 eta: 5 days, 14:47:06 time: 0.2370 data_time: 0.0037 memory: 6053 grad_norm: 460.2511 loss: 422.2496 loss_cls: 168.1797 loss_bbox: 121.7420 loss_dfl: 132.3279 03/12 17:58:39 - mmengine - INFO - Epoch(train) [1][ 450/19019] base_lr: 2.0000e-03 lr: 1.5739e-05 eta: 5 days, 14:15:45 time: 0.2463 data_time: 0.0040 memory: 6173 grad_norm: 526.7233 loss: 415.7690 loss_cls: 164.0228 loss_bbox: 120.9978 loss_dfl: 130.7484 03/12 17:58:51 - mmengine - INFO - Epoch(train) [1][ 500/19019] base_lr: 2.0000e-03 lr: 1.7491e-05 eta: 5 days, 13:11:15 time: 0.2339 data_time: 0.0037 memory: 8267 grad_norm: 540.6158 loss: 411.8747 loss_cls: 161.5670 loss_bbox: 120.9805 loss_dfl: 129.3272 03/12 17:59:03 - mmengine - INFO - Epoch(train) [1][ 550/19019] base_lr: 2.0000e-03 lr: 1.9244e-05 eta: 5 days, 12:17:28 time: 0.2336 data_time: 0.0040 memory: 5800 grad_norm: 532.7258 loss: 404.0686 loss_cls: 158.0140 loss_bbox: 118.3286 loss_dfl: 127.7259 03/12 17:59:15 - mmengine - INFO - Epoch(train) [1][ 600/19019] base_lr: 2.0000e-03 lr: 2.0997e-05 eta: 5 days, 11:37:55 time: 0.2356 data_time: 0.0039 memory: 6546 grad_norm: 513.6272 loss: 400.2819 loss_cls: 156.8279 loss_bbox: 117.0981 loss_dfl: 126.3559 03/12 17:59:27 - mmengine - INFO - Epoch(train) [1][ 650/19019] base_lr: 2.0000e-03 lr: 2.2749e-05 eta: 5 days, 11:43:15 time: 0.2515 data_time: 0.0039 memory: 9239 grad_norm: 511.9093 loss: 394.8929 loss_cls: 152.5437 loss_bbox: 117.8386 loss_dfl: 124.5105 03/12 17:59:39 - mmengine - INFO - Epoch(train) [1][ 700/19019] base_lr: 2.0000e-03 lr: 2.4502e-05 eta: 5 days, 11:16:46 time: 0.2378 data_time: 0.0040 memory: 5706 grad_norm: 513.3235 loss: 388.5590 loss_cls: 151.2647 loss_bbox: 115.0196 loss_dfl: 122.2748 03/12 17:59:51 - mmengine - INFO - Epoch(train) [1][ 750/19019] base_lr: 2.0000e-03 lr: 2.6254e-05 eta: 5 days, 10:48:51 time: 0.2355 data_time: 0.0041 memory: 7240 grad_norm: 514.5650 loss: 383.7197 loss_cls: 150.8671 loss_bbox: 112.1987 loss_dfl: 120.6540 03/12 18:00:03 - mmengine - INFO - Epoch(train) [1][ 800/19019] base_lr: 2.0000e-03 lr: 2.8007e-05 eta: 5 days, 10:51:38 time: 0.2492 data_time: 0.0040 memory: 6333 grad_norm: 508.5694 loss: 376.1887 loss_cls: 148.2612 loss_bbox: 109.8921 loss_dfl: 118.0354 03/12 18:00:15 - mmengine - INFO - Epoch(train) [1][ 850/19019] base_lr: 2.0000e-03 lr: 2.9760e-05 eta: 5 days, 10:38:19 time: 0.2408 data_time: 0.0041 memory: 5920 grad_norm: 480.6808 loss: 369.1250 loss_cls: 145.7235 loss_bbox: 107.7232 loss_dfl: 115.6783 03/12 18:00:27 - mmengine - INFO - Epoch(train) [1][ 900/19019] base_lr: 2.0000e-03 lr: 3.1512e-05 eta: 5 days, 10:20:42 time: 0.2375 data_time: 0.0038 memory: 5946 grad_norm: 458.0789 loss: 358.9182 loss_cls: 141.4935 loss_bbox: 105.1733 loss_dfl: 112.2514 03/12 18:00:40 - mmengine - INFO - Epoch(train) [1][ 950/19019] base_lr: 2.0000e-03 lr: 3.3265e-05 eta: 5 days, 10:23:21 time: 0.2485 data_time: 0.0039 memory: 5800 grad_norm: 462.1731 loss: 353.5276 loss_cls: 141.4663 loss_bbox: 102.6783 loss_dfl: 109.3830 03/12 18:00:51 - mmengine - INFO - Exp name: yolo_world_v2_s_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_train_lvis_minival_20240312_175448 03/12 18:00:51 - mmengine - INFO - Epoch(train) [1][ 1000/19019] base_lr: 2.0000e-03 lr: 3.5018e-05 eta: 5 days, 10:03:59 time: 0.2348 data_time: 0.0038 memory: 6706 grad_norm: 437.9579 loss: 348.6449 loss_cls: 140.3549 loss_bbox: 101.0544 loss_dfl: 107.2357 03/12 18:01:03 - mmengine - INFO - Epoch(train) [1][ 1050/19019] base_lr: 2.0000e-03 lr: 3.6770e-05 eta: 5 days, 9:47:01 time: 0.2352 data_time: 0.0038 memory: 6786 grad_norm: 405.5232 loss: 341.1842 loss_cls: 137.7237 loss_bbox: 99.0177 loss_dfl: 104.4427 03/12 18:01:15 - mmengine - INFO - Epoch(train) [1][ 1100/19019] base_lr: 2.0000e-03 lr: 3.8523e-05 eta: 5 days, 9:27:24 time: 0.2323 data_time: 0.0038 memory: 5586 grad_norm: 364.3518 loss: 335.0305 loss_cls: 136.1223 loss_bbox: 96.9399 loss_dfl: 101.9682 03/12 18:01:27 - mmengine - INFO - Epoch(train) [1][ 1150/19019] base_lr: 2.0000e-03 lr: 4.0276e-05 eta: 5 days, 9:29:59 time: 0.2472 data_time: 0.0039 memory: 6013 grad_norm: 358.9744 loss: 330.3108 loss_cls: 135.4357 loss_bbox: 95.4105 loss_dfl: 99.4646 03/12 18:01:39 - mmengine - INFO - Epoch(train) [1][ 1200/19019] base_lr: 2.0000e-03 lr: 4.2028e-05 eta: 5 days, 9:26:46 time: 0.2430 data_time: 0.0038 memory: 11920 grad_norm: 344.4472 loss: 327.1931 loss_cls: 134.6496 loss_bbox: 94.8774 loss_dfl: 97.6660 03/12 18:01:51 - mmengine - INFO - Epoch(train) [1][ 1250/19019] base_lr: 2.0000e-03 lr: 4.3781e-05 eta: 5 days, 9:11:37 time: 0.2334 data_time: 0.0037 memory: 5560 grad_norm: 337.1382 loss: 321.7565 loss_cls: 132.5439 loss_bbox: 92.8661 loss_dfl: 96.3464 03/12 18:02:03 - mmengine - INFO - Epoch(train) [1][ 1300/19019] base_lr: 2.0000e-03 lr: 4.5533e-05 eta: 5 days, 9:14:24 time: 0.2472 data_time: 0.0037 memory: 5973 grad_norm: 323.9022 loss: 318.5945 loss_cls: 131.9618 loss_bbox: 91.9790 loss_dfl: 94.6536 03/12 18:02:15 - mmengine - INFO - Epoch(train) [1][ 1350/19019] base_lr: 2.0000e-03 lr: 4.7286e-05 eta: 5 days, 9:01:40 time: 0.2341 data_time: 0.0039 memory: 6306 grad_norm: 319.6556 loss: 314.5906 loss_cls: 131.8039 loss_bbox: 89.8786 loss_dfl: 92.9081 03/12 18:02:27 - mmengine - INFO - Epoch(train) [1][ 1400/19019] base_lr: 2.0000e-03 lr: 4.9039e-05 eta: 5 days, 8:47:36 time: 0.2322 data_time: 0.0037 memory: 6466 grad_norm: 306.2527 loss: 311.0356 loss_cls: 129.8536 loss_bbox: 89.9395 loss_dfl: 91.2425 03/12 18:02:39 - mmengine - INFO - Epoch(train) [1][ 1450/19019] base_lr: 2.0000e-03 lr: 5.0791e-05 eta: 5 days, 8:54:51 time: 0.2508 data_time: 0.0039 memory: 6013 grad_norm: 296.1577 loss: 307.4349 loss_cls: 129.2732 loss_bbox: 88.2924 loss_dfl: 89.8693 03/12 18:02:51 - mmengine - INFO - Epoch(train) [1][ 1500/19019] base_lr: 2.0000e-03 lr: 5.2544e-05 eta: 5 days, 8:44:32 time: 0.2346 data_time: 0.0041 memory: 5746 grad_norm: 294.3796 loss: 302.3804 loss_cls: 126.6819 loss_bbox: 87.4314 loss_dfl: 88.2670 03/12 18:03:03 - mmengine - INFO - Epoch(train) [1][ 1550/19019] base_lr: 2.0000e-03 lr: 5.4297e-05 eta: 5 days, 8:36:30 time: 0.2362 data_time: 0.0038 memory: 5826 grad_norm: 284.2909 loss: 300.7494 loss_cls: 126.4133 loss_bbox: 87.3834 loss_dfl: 86.9527 03/12 18:03:15 - mmengine - INFO - Epoch(train) [1][ 1600/19019] base_lr: 2.0000e-03 lr: 5.6049e-05 eta: 5 days, 8:41:01 time: 0.2484 data_time: 0.0039 memory: 7093 grad_norm: 286.5743 loss: 296.5825 loss_cls: 125.3398 loss_bbox: 85.3759 loss_dfl: 85.8668 03/12 18:03:27 - mmengine - INFO - Epoch(train) [1][ 1650/19019] base_lr: 2.0000e-03 lr: 5.7802e-05 eta: 5 days, 8:35:23 time: 0.2381 data_time: 0.0040 memory: 5573 grad_norm: 272.7181 loss: 293.8952 loss_cls: 125.2692 loss_bbox: 83.8527 loss_dfl: 84.7733 03/12 18:03:39 - mmengine - INFO - Epoch(train) [1][ 1700/19019] base_lr: 2.0000e-03 lr: 5.9554e-05 eta: 5 days, 8:27:36 time: 0.2355 data_time: 0.0038 memory: 6200 grad_norm: 260.2163 loss: 292.4793 loss_cls: 124.4377 loss_bbox: 84.3622 loss_dfl: 83.6794 03/12 18:03:51 - mmengine - INFO - Epoch(train) [1][ 1750/19019] base_lr: 2.0000e-03 lr: 6.1307e-05 eta: 5 days, 8:21:50 time: 0.2372 data_time: 0.0040 memory: 7253 grad_norm: 272.3926 loss: 289.9523 loss_cls: 123.5043 loss_bbox: 83.4889 loss_dfl: 82.9591 03/12 18:04:03 - mmengine - INFO - Epoch(train) [1][ 1800/19019] base_lr: 2.0000e-03 lr: 6.3060e-05 eta: 5 days, 8:24:34 time: 0.2465 data_time: 0.0037 memory: 6826 grad_norm: 268.1772 loss: 285.6544 loss_cls: 120.5544 loss_bbox: 82.8305 loss_dfl: 82.2695总的batch size变了,lr也得跟着变呀
谢谢,我明天去试试按比例调整一下学习率,请问还有其他参数需要跟着调整吗?
说实话,你只用O365,不用另外两个数据集;batch size也只有32,与原文512差太多。感觉很难复现得出来。
我先试试看性能会差多少,作者文章的Table3: Ablations on Pre-training Data给了单独用O365的性能,我之后对比一下。另外,我查阅了一些资料,“当batch size大小改为 K 时,有些人建议学习率更改为 sqrt(K) 或 K,使用相同的学习率也可以。” 学习率我都试试吧,谢谢。
您好@JiayuanWang-JW, YOLO-World会在loss上乘上batch_size * world_size的weight,所以batch size或者gpu不同,loss weight会有尺度上的差距,我们这边的batch size=16,world_size=32,你可以换算一下。另外,目前YOLO系列采用lossweight调整batch size,调整lr的效果其实被等效了(AdamW特殊),调整lr意义不大。
您好@JiayuanWang-JW, YOLO-World会在loss上乘上batch_size * world_size的weight,所以batch size或者gpu不同,loss weight会有尺度上的差距,我们这边的batch size=16,world_size=32,你可以换算一下。另外,目前YOLO系列采用lossweight调整batch size,调整lr的效果其实被等效了(AdamW特殊),调整lr意义不大。
谢谢,那我先等这个实验跑完看看结果吧,world_size应该是通过get_dist_info()自动获取的,所以我应该也无法改变。 请问有其他参数您认为需要调整的吗?目前我跟您的实验设置就是GPU数量上和型号的差别,每个GPU上的bs也是16。不知道能不能复现出您Table3: Ablations on Pre-training Data中O365的结果。
可能唯一需要考虑的是weight decay,YOLO也去做了weight decay的scale,我们在32卡上训练的weight decay接近0.2