YOLO-World icon indicating copy to clipboard operation
YOLO-World copied to clipboard

训练过程中的报错

Open 997897336 opened this issue 11 months ago • 1 comments

03/13 00:47:06 - mmengine - ERROR - /usr/local/lib/python3.8/dist-packages/mmdet/evaluation/metrics/coco_metric.py - compute_metrics - 465 - The testing results of the whole dataset is empty.

997897336 avatar Mar 13 '24 00:03 997897336

03/13 00:49:02 - mmengine - INFO - Epoch(train) [17][50/70] base_lr: 2.0000e-04 lr: 1.6287e-04 eta: 1:13:09 time: 0.9742 data_time: 0.0516 memory: 10772 grad_norm: 0.0043 loss: 0.0000 loss_cls: 0.0000 loss_bbox: 0.0000 loss_dfl: 0.0000 03/13 00:49:20 - mmengine - INFO - Exp name: yolo_world_l_dual_vlpan_2e-4_80e_8gpus_finetune_coco_20240313_002749 另外这后面几个loss都是0是怎么回事

997897336 avatar Mar 13 '24 00:03 997897336

我也遇到了一模一样的错误,在Val的时候报这样的错,但是会继续训练,最后得到的模型也检测不出任何东西来。 我不知道是不是我的图片尺寸没有统一导致的。

CaffeineLiqueur avatar Mar 13 '24 03:03 CaffeineLiqueur

@CaffeineLiqueur 对对对 模型出不来任何东西,是不是哪里配置的不对?建议作者搞个详细的说明出来 @

997897336 avatar Mar 13 '24 03:03 997897336

@CaffeineLiqueur @997897336 你们zero-shot评测过么,在自己的数据集上,用目前release的pre-trained models?

wondervictor avatar Mar 13 '24 03:03 wondervictor

使用pre-trained model跑自己的数据集可以检测到目标 @wondervictor

997897336 avatar Mar 13 '24 03:03 997897336

@wondervictor 我的下游任务是医学图像,较为困难,pre-trained models检测不了。所以想到尝试fine-tune,但是出现了这样的错误。

CaffeineLiqueur avatar Mar 13 '24 04:03 CaffeineLiqueur

@997897336 你这边loss全为0可能要检查一下标注了 @CaffeineLiqueur 你这边loss正常么 另外,麻烦提供一些详细的设置,或者数据形式之类的

wondervictor avatar Mar 13 '24 05:03 wondervictor

我的数据是coco2014格式的,然后自己写了个脚本转成2017的,标注没问题,训了yolox可以得到不错的结果,其他的设置都没改 @wondervictor

997897336 avatar Mar 13 '24 05:03 997897336

@CaffeineLiqueur 可以提供一份config信息吗,因为我们的设置可能和YOLOX有区别。

wondervictor avatar Mar 13 '24 05:03 wondervictor

@wondervictor 您好! 我的loss_bbox和loss_dfl一开始就是0。前两个loss在前几个epoch后就归零了。 关于详细的设置: 我使用的是coco2017数据集的格式,只有1类。由于我的显卡资源有限,我在yolo_world_l_dual_vlpan_2e-4_80e_8gpus_finetune_coco.py的文件上进行了如下修改:

             _base_ = (
                '../../third_party/mmyolo/configs/yolov8/'
                'yolov8_s_syncbn_fast_8xb16-500e_coco.py')

           load_from='/home/project/YOLO-World/pretrained/yolo_world_s_clip_base_dual_vlpan_2e- 
                3adamw_32xb16_100e_o365_goldg_train_pretrained-18bea4d2.pth'
                
            num_classes = 1
            num_training_classes = 1
            
其中,由于prob=_base_.mixup_prob这句话会报错,可能是s模型没有mixup_prob这个内容,我就注释掉了
            

可能是我想当然了,所以在配置文件上做了太简单的修改导致了这样的错误。还烦请指教!

CaffeineLiqueur avatar Mar 13 '24 05:03 CaffeineLiqueur

base = ( '../../third_party/mmyolo/configs/yolov8/' 'yolov8_l_syncbn_fast_8xb16-500e_coco.py') custom_imports = dict( imports=['yolo_world'], allow_failed_imports=False)

hyper-parameters

num_classes = 80 num_training_classes = 80 max_epochs = 80 # Maximum training epochs close_mosaic_epochs = 10 save_epoch_intervals = 5 text_channels = 512 neck_embed_channels = [128, 256, base.last_stage_out_channels // 2] neck_num_heads = [4, 8, base.last_stage_out_channels // 2 // 32] base_lr = 2e-4 weight_decay = 0.05 train_batch_size_per_gpu = 16 load_from='pretrained_models/yolo_world_l_clip_base_dual_vlpan_2e-3adamw_32xb16_100e_o365_goldg_train_pretrained-0e566235.pth' persistent_workers = False

model settings

model = dict( type='YOLOWorldDetector', mm_neck=True, num_train_classes=num_training_classes, num_test_classes=num_classes, data_preprocessor=dict(type='YOLOWDetDataPreprocessor'), backbone=dict( delete=True, type='MultiModalYOLOBackbone', image_model={{base.model.backbone}}, text_model=dict( type='HuggingCLIPLanguageBackbone', model_name='/build/YOLO-World/clip-vit-base-patch32', frozen_modules=['all'])), neck=dict(type='YOLOWorldDualPAFPN', guide_channels=text_channels, embed_channels=neck_embed_channels, num_heads=neck_num_heads, block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'), text_enhancder=dict(type='ImagePoolingAttentionModule', embed_channels=256, num_heads=8)), bbox_head=dict(type='YOLOWorldHead', head_module=dict(type='YOLOWorldHeadModule', embed_dims=text_channels, num_classes=num_training_classes)), train_cfg=dict(assigner=dict(num_classes=num_training_classes)))

dataset settings

text_transform = [ dict(type='RandomLoadText', num_neg_samples=(num_classes, num_classes), max_num_samples=num_training_classes, padding_to_max=True, padding_value=''), dict(type='mmdet.PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'texts')) ] mosaic_affine_transform = [ dict( type='MultiModalMosaic', img_scale=base.img_scale, pad_val=114.0, pre_transform=base.pre_transform), dict( type='YOLOv5RandomAffine', max_rotate_degree=0.0, max_shear_degree=0.0, max_aspect_ratio=100., scaling_ratio_range=(1 - base.affine_scale, 1 + base.affine_scale), # img_scale is (width, height) border=(-base.img_scale[0] // 2, -base.img_scale[1] // 2), border_val=(114, 114, 114)) ] train_pipeline = [ *base.pre_transform, mosaic_affine_transform, dict( type='YOLOv5MultiModalMixUp', prob=base.mixup_prob, pre_transform=[base.pre_transform, *mosaic_affine_transform]), *base.last_transform[:-1], *text_transform ] train_pipeline_stage2 = [ *base.train_pipeline_stage2[:-1], *text_transform ] coco_train_dataset = dict( delete=True, type='MultiModalDataset', dataset=dict( type='YOLOv5CocoDataset', data_root='data/coco', ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), filter_cfg=dict(filter_empty_gt=False, min_size=32)), class_text_path='data/texts/coco_class_texts.json', pipeline=train_pipeline)

train_dataloader = dict( persistent_workers=persistent_workers, batch_size=train_batch_size_per_gpu, collate_fn=dict(type='yolow_collate'), dataset=coco_train_dataset) test_pipeline = [ *base.test_pipeline[:-1], dict(type='LoadText'), dict( type='mmdet.PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'pad_param', 'texts')) ] coco_val_dataset = dict( delete=True, type='MultiModalDataset', dataset=dict( type='YOLOv5CocoDataset', data_root='data/coco', ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), filter_cfg=dict(filter_empty_gt=False, min_size=32)), class_text_path='data/texts/coco_class_texts.json', pipeline=test_pipeline) val_dataloader = dict(dataset=coco_val_dataset) test_dataloader = val_dataloader

training settings

default_hooks = dict( param_scheduler=dict( scheduler_type='linear', lr_factor=0.01, max_epochs=max_epochs), checkpoint=dict( max_keep_ckpts=-1, save_best=None, interval=save_epoch_intervals)) custom_hooks = [ dict( type='EMAHook', ema_type='ExpMomentumEMA', momentum=0.0001, update_buffers=True, strict_load=False, priority=49), dict( type='mmdet.PipelineSwitchHook', switch_epoch=max_epochs - close_mosaic_epochs, switch_pipeline=train_pipeline_stage2) ] train_cfg = dict( max_epochs=max_epochs, val_interval=5, dynamic_intervals=[((max_epochs - close_mosaic_epochs), base.val_interval_stage2)]) optim_wrapper = dict( optimizer=dict( delete=True, type='AdamW', lr=base_lr, weight_decay=weight_decay, batch_size_per_gpu=train_batch_size_per_gpu), paramwise_cfg=dict( bias_decay_mult=0.0, norm_decay_mult=0.0, custom_keys={'backbone.text_model': dict(lr_mult=0.01), 'logit_scale': dict(weight_decay=0.0)}), constructor='YOLOWv5OptimizerConstructor')

evaluation settings

val_evaluator = dict( delete=True, type='mmdet.CocoMetric', proposal_nums=(100, 1, 10), ann_file='data/coco/annotations/instances_val2017.json', metric='bbox') 使用的是yolo_world_l_dual_vlpan_2e-4_80e_8gpus_finetune_coco.py 训练命令是./tools/dist_train.sh configs/finetune_coco/yolo_world_l_dual_vlpan_2e-4_80e_8gpus_finetune_coco.py 1 --amp 数据集标注为 d6abd62c-3d2c-41a9-99df-4c06a40db5a9 @wondervictor

997897336 avatar Mar 13 '24 05:03 997897336

03/13 00:47:06 - mmengine - ERROR - /usr/local/lib/python3.8/dist-packages/mmdet/evaluation/metrics/coco_metric.py - compute_metrics - 465 - The testing results of the whole dataset is empty.

想问一下你的配置文件里data_prefix=dict(img='val2017/')是什么来的呢,这个是指data/coco/val2017这个文件夹吗,我用我自己的数据集路径报错IsADirectoryError

KDgggg avatar Mar 13 '24 09:03 KDgggg

coco_val_dataset = dict( delete=True, type='MultiModalDataset', dataset=dict( type='YOLOv5CocoDataset', data_root='data/coco', ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), filter_cfg=dict(filter_empty_gt=False, min_size=32)), class_text_path='data/texts/coco_class_texts.json', pipeline=test_pipeline) val_dataloader = dict(dataset=coco_val_dataset) test_dataloader = val_dataloader 原配置文件就是这样的,我没改 @KDgggg

997897336 avatar Mar 14 '24 00:03 997897336

各位老兄,这个问题解决了吗 @wondervictor @CaffeineLiqueur @KDgggg @onuralpszr

997897336 avatar Mar 18 '24 00:03 997897336

各位老兄,这个问题解决了吗 @wondervictor @CaffeineLiqueur @KDgggg @onuralpszr

我也遇到了这个问题,我的数据集在pre-train上没问题,但是finetune上就不行。可以继续训练,但是会报 The testing results of the whole dataset is empty. 请问老兄是如何解决的?

JiayuanWang-JW avatar Mar 21 '24 18:03 JiayuanWang-JW

这个问题还是没有解决,因此打开了一下,详细说一下,coco数据集格式,有五类, image

问题是 image

这两个loss从一开始就是0,loss_bbox: 0.0000 loss_dfl: 0.0000

997897336 avatar Mar 26 '24 07:03 997897336

#146

997897336 avatar Mar 26 '24 07:03 997897336

这个问题还是没有解决,因此打开了一下,详细说一下,coco数据集格式,有五类, image

问题是 image

这两个loss从一开始就是0,loss_bbox: 0.0000 loss_dfl: 0.0000

IMG_1997 兄弟 你试一下修改metainfo,像我图片一样

KDgggg avatar Mar 26 '24 07:03 KDgggg

@KDgggg 兄弟 ,改了还是不行 可否给个邮箱细说?

997897336 avatar Mar 26 '24 07:03 997897336

@KDgggg 兄弟 ,改了还是不行 可否给个邮箱细说?

IMG_2001 IMG_2002 IMG_2003 IMG_2004 IMG_2006 这是我的配置,我觉得可以检查一下模型结构和使用的权重是否匹配,然后标注的类别顺序是否和text的顺序一致。

KDgggg avatar Mar 26 '24 07:03 KDgggg

@KDgggg 麻了哥们

997897336 avatar Mar 26 '24 08:03 997897336

@KDgggg 麻了哥们

[email protected]

KDgggg avatar Mar 26 '24 08:03 KDgggg

这个问题还是没有解决,因此打开了一下,详细说一下,coco数据集格式,有五类, image 问题是 image 这两个loss从一开始就是0,loss_bbox: 0.0000 loss_dfl: 0.0000

IMG_1997 兄弟 你试一下修改metainfo,像我图片一样

It works for me!

lin-whale avatar Apr 18 '24 02:04 lin-whale

It works for me! Thank you.

LLH-Harward avatar Jul 30 '24 15:07 LLH-Harward