YOLO-World icon indicating copy to clipboard operation
YOLO-World copied to clipboard

Can you help resolve the error message in fine-tuning the Coco dataset?

Open 937739823 opened this issue 1 year ago • 0 comments

Execute commands : python tools/train.py configs/finetune_coco/yolo_world_v2_l_vlpan_bn_sgd_1e-3_40e_8gpus_finetune_coco.py --work-dir log --amp --resume

ERROR:

root@46ad73408c38:/home/hanyong/yolo-world# python tools/train.py configs/finetune_coco/yolo_world_v2_l_vlpan_bn_sgd_1e-3_40e_8gpus_finetune_coco.py --work-dir log --amp --resume

NOTE! Installing ujson may make loading annotations faster. 06/24 05:52:02 - mmengine - WARNING - Failed to search registry with scope "mmyolo" in the "log_processor" registry tree. As a workaround, the current "log_processor" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmyolo" is a correct scope, or whether the registry is initialized. /bin/bash: /usr/local/cuda-11.1/bin/nvcc: No such file or directory 06/24 05:52:03 - mmengine - INFO -

System environment: sys.platform: linux Python: 3.8.8 (default, Feb 24 2021, 21:46:12) [GCC 7.3.0] CUDA available: True MUSA available: False numpy_random_seed: 871293557 GPU 0: Tesla V100-PCIE-32GB CUDA_HOME: /usr/local/cuda-11.1/ NVCC: Not Available GCC: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 PyTorch: 1.11.0+cu113 PyTorch compiling details: PyTorch built with:

  • GCC 7.3

  • C++ Version: 201402

  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications

  • Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)

  • OpenMP 201511 (a.k.a. OpenMP 4.5)

  • LAPACK is enabled (usually provided by MKL)

  • NNPACK is enabled

  • CPU capability usage: AVX2

  • CUDA Runtime 11.3

  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86

  • CuDNN 8.2

  • Magma 2.5.2

  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

    TorchVision: 0.12.0+cu113 OpenCV: 4.2.0 MMEngine: 0.10.4

Runtime environment: cudnn_benchmark: True mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: 871293557 Distributed launcher: none Distributed training: False GPU number: 1

06/24 05:52:05 - mmengine - INFO - Config: _backend_args = None _multiscale_resize_transforms = [ dict( transforms=[ dict(scale=( 640, 640, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=False, pad_val=dict(img=114), scale=( 640, 640, ), type='LetterResize'), ], type='Compose'), dict( transforms=[ dict(scale=( 320, 320, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=False, pad_val=dict(img=114), scale=( 320, 320, ), type='LetterResize'), ], type='Compose'), dict( transforms=[ dict(scale=( 960, 960, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=False, pad_val=dict(img=114), scale=( 960, 960, ), type='LetterResize'), ], type='Compose'), ] affine_scale = 0.9 albu_train_transforms = [ dict(p=0.01, type='Blur'), dict(p=0.01, type='MedianBlur'), dict(p=0.01, type='ToGray'), dict(p=0.01, type='CLAHE'), ] backend_args = None base_lr = 0.001 batch_shapes_cfg = None close_mosaic_epochs = 30 coco_train_dataset = dict( delete=True, class_text_path='data/texts/coco_class_texts.json', dataset=dict( ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), data_root='data/coco/coco2017_50', filter_cfg=dict(filter_empty_gt=False, min_size=32), type='YOLOv5CocoDataset'), pipeline=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( img_scale=( 640, 640, ), pad_val=114.0, pre_transform=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), ], type='MultiModalMosaic'), dict( border=( -320, -320, ), border_val=( 114, 114, 114, ), max_aspect_ratio=100.0, max_rotate_degree=0.0, max_shear_degree=0.0, scaling_ratio_range=( 0.09999999999999998, 1.9, ), type='YOLOv5RandomAffine'), dict( pre_transform=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( img_scale=( 640, 640, ), pad_val=114.0, pre_transform=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), ], type='MultiModalMosaic'), dict( border=( -320, -320, ), border_val=( 114, 114, 114, ), max_aspect_ratio=100.0, max_rotate_degree=0.0, max_shear_degree=0.0, scaling_ratio_range=( 0.09999999999999998, 1.9, ), type='YOLOv5RandomAffine'), ], prob=0.15, type='YOLOv5MultiModalMixUp'), dict( bbox_params=dict( format='pascal_voc', label_fields=[ 'gt_bboxes_labels', 'gt_ignore_flags', ], type='BboxParams'), keymap=dict(gt_bboxes='bboxes', img='image'), transforms=[ dict(p=0.01, type='Blur'), dict(p=0.01, type='MedianBlur'), dict(p=0.01, type='ToGray'), dict(p=0.01, type='CLAHE'), ], type='mmdet.Albu'), dict(type='YOLOv5HSVRandomAug'), dict(prob=0.5, type='mmdet.RandomFlip'), dict( max_num_samples=80, num_neg_samples=( 80, 80, ), padding_to_max=True, padding_value='', type='RandomLoadText'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'texts', ), type='mmdet.PackDetInputs'), ], type='MultiModalDataset') coco_val_dataset = dict( delete=True, class_text_path='data/texts/coco_class_texts.json', dataset=dict( ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), data_root='data/coco/coco2017_50', filter_cfg=dict(filter_empty_gt=False, min_size=32), type='YOLOv5CocoDataset'), pipeline=[ dict(backend_args=None, type='LoadImageFromFile'), dict(scale=( 640, 640, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=False, pad_val=dict(img=114), scale=( 640, 640, ), type='LetterResize'), dict(scope='mmdet', type='LoadAnnotations', with_bbox=True), dict(type='LoadText'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'pad_param', 'texts', ), type='mmdet.PackDetInputs'), ], type='MultiModalDataset') custom_hooks = [ dict( ema_type='ExpMomentumEMA', momentum=0.0001, priority=49, strict_load=False, type='EMAHook', update_buffers=True), dict( switch_epoch=10, switch_pipeline=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(scale=( 640, 640, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=True, pad_val=dict(img=114.0), scale=( 640, 640, ), type='LetterResize'), dict( border_val=( 114, 114, 114, ), max_aspect_ratio=100, max_rotate_degree=0.0, max_shear_degree=0.0, scaling_ratio_range=( 0.09999999999999998, 1.9, ), type='YOLOv5RandomAffine'), dict( bbox_params=dict( format='pascal_voc', label_fields=[ 'gt_bboxes_labels', 'gt_ignore_flags', ], type='BboxParams'), keymap=dict(gt_bboxes='bboxes', img='image'), transforms=[ dict(p=0.01, type='Blur'), dict(p=0.01, type='MedianBlur'), dict(p=0.01, type='ToGray'), dict(p=0.01, type='CLAHE'), ], type='mmdet.Albu'), dict(type='YOLOv5HSVRandomAug'), dict(prob=0.5, type='mmdet.RandomFlip'), dict( max_num_samples=80, num_neg_samples=( 80, 80, ), padding_to_max=True, padding_value='', type='RandomLoadText'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'texts', ), type='mmdet.PackDetInputs'), ], type='mmdet.PipelineSwitchHook'), ] custom_imports = dict( allow_failed_imports=False, imports=[ 'yolo_world', ]) data_root = 'data/coco/' dataset_type = 'YOLOv5CocoDataset' deepen_factor = 1.0 default_hooks = dict( checkpoint=dict( interval=5, max_keep_ckpts=-1, save_best=None, type='CheckpointHook'), logger=dict(interval=50, type='LoggerHook'), param_scheduler=dict( lr_factor=0.01, max_epochs=40, scheduler_type='linear', type='YOLOv5ParamSchedulerHook'), sampler_seed=dict(type='DistSamplerSeedHook'), timer=dict(type='IterTimerHook'), visualization=dict(type='mmdet.DetVisualizationHook')) default_scope = 'mmyolo' env_cfg = dict( cudnn_benchmark=True, dist_cfg=dict(backend='nccl'), mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0)) img_scale = ( 640, 640, ) img_scales = [ ( 640, 640, ), ( 320, 320, ), ( 960, 960, ), ] last_stage_out_channels = 512 last_transform = [ dict( bbox_params=dict( format='pascal_voc', label_fields=[ 'gt_bboxes_labels', 'gt_ignore_flags', ], type='BboxParams'), keymap=dict(gt_bboxes='bboxes', img='image'), transforms=[ dict(p=0.01, type='Blur'), dict(p=0.01, type='MedianBlur'), dict(p=0.01, type='ToGray'), dict(p=0.01, type='CLAHE'), ], type='mmdet.Albu'), dict(type='YOLOv5HSVRandomAug'), dict(prob=0.5, type='mmdet.RandomFlip'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'flip', 'flip_direction', ), type='mmdet.PackDetInputs'), ] launcher = 'none' load_from = None log_level = 'INFO' log_processor = dict(by_epoch=True, type='LogProcessor', window_size=50) loss_bbox_weight = 7.5 loss_cls_weight = 0.5 loss_dfl_weight = 0.375 lr_factor = 0.01 max_aspect_ratio = 100 max_epochs = 40 max_keep_ckpts = 2 mixup_prob = 0.15 model = dict( backbone=dict( image_model=dict( act_cfg=dict(inplace=True, type='SiLU'), arch='P5', deepen_factor=1.0, last_stage_out_channels=512, norm_cfg=dict(eps=0.001, momentum=0.03, type='BN'), type='YOLOv8CSPDarknet', widen_factor=1.0), text_model=dict( frozen_modules=[ 'all', ], model_name= '/home/hanyong/yolo-world/configs/openai/clip-vit-base-patch32', type='HuggingCLIPLanguageBackbone'), type='MultiModalYOLOBackbone'), bbox_head=dict( bbox_coder=dict(type='DistancePointBBoxCoder'), head_module=dict( act_cfg=dict(inplace=True, type='SiLU'), embed_dims=512, featmap_strides=[ 8, 16, 32, ], in_channels=[ 256, 512, 512, ], norm_cfg=dict(eps=0.001, momentum=0.03, type='BN'), num_classes=80, reg_max=16, type='YOLOWorldHeadModule', use_bn_head=True, widen_factor=1.0), loss_bbox=dict( bbox_format='xyxy', iou_mode='ciou', loss_weight=7.5, reduction='sum', return_iou=False, type='IoULoss'), loss_cls=dict( loss_weight=0.5, reduction='none', type='mmdet.CrossEntropyLoss', use_sigmoid=True), loss_dfl=dict( loss_weight=0.375, reduction='mean', type='mmdet.DistributionFocalLoss'), prior_generator=dict( offset=0.5, strides=[ 8, 16, 32, ], type='mmdet.MlvlPointGenerator'), type='YOLOWorldHead'), data_preprocessor=dict( bgr_to_rgb=True, mean=[ 0.0, 0.0, 0.0, ], std=[ 255.0, 255.0, 255.0, ], type='YOLOWDetDataPreprocessor'), mm_neck=True, neck=dict( act_cfg=dict(inplace=True, type='SiLU'), block_cfg=dict(type='MaxSigmoidCSPLayerWithTwoConv'), deepen_factor=1.0, embed_channels=[ 128, 256, 256, ], guide_channels=512, in_channels=[ 256, 512, 512, ], norm_cfg=dict(eps=0.001, momentum=0.03, type='BN'), num_csp_blocks=3, num_heads=[ 4, 8, 8, ], out_channels=[ 256, 512, 512, ], type='YOLOWorldPAFPN', widen_factor=1.0), num_test_classes=80, num_train_classes=80, test_cfg=dict( max_per_img=300, multi_label=True, nms=dict(iou_threshold=0.7, type='nms'), nms_pre=30000, score_thr=0.001), train_cfg=dict( assigner=dict( alpha=0.5, beta=6.0, eps=1e-09, num_classes=80, topk=10, type='BatchTaskAlignedAssigner', use_ciou=True)), type='YOLOWorldDetector') model_test_cfg = dict( max_per_img=300, multi_label=True, nms=dict(iou_threshold=0.7, type='nms'), nms_pre=30000, score_thr=0.001) mosaic_affine_transform = [ dict( img_scale=( 640, 640, ), pad_val=114.0, pre_transform=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), ], type='MultiModalMosaic'), dict( border=( -320, -320, ), border_val=( 114, 114, 114, ), max_aspect_ratio=100.0, max_rotate_degree=0.0, max_shear_degree=0.0, scaling_ratio_range=( 0.09999999999999998, 1.9, ), type='YOLOv5RandomAffine'), ] neck_embed_channels = [ 128, 256, 256, ] neck_num_heads = [ 4, 8, 8, ] norm_cfg = dict(eps=0.001, momentum=0.03, type='BN') num_classes = 80 num_det_layers = 3 num_training_classes = 80 optim_wrapper = dict( clip_grad=dict(max_norm=10.0), constructor='YOLOWv5OptimizerConstructor', loss_scale='dynamic', optimizer=dict( batch_size_per_gpu=16, lr=0.001, momentum=0.937, nesterov=True, type='SGD', weight_decay=0.0005), paramwise_cfg=dict( custom_keys=dict({ 'backbone.text_model': dict(lr_mult=0.01), 'logit_scale': dict(weight_decay=0.0) })), type='AmpOptimWrapper') param_scheduler = None persistent_workers = False pre_transform = [ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), ] resume = True save_epoch_intervals = 5 strides = [ 8, 16, 32, ] tal_alpha = 0.5 tal_beta = 6.0 tal_topk = 10 test_cfg = dict(type='TestLoop') test_dataloader = dict( batch_size=1, dataset=dict( class_text_path='data/texts/coco_class_texts.json', dataset=dict( ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), data_root='data/coco/coco2017_50', filter_cfg=dict(filter_empty_gt=False, min_size=32), type='YOLOv5CocoDataset'), pipeline=[ dict(backend_args=None, type='LoadImageFromFile'), dict(scale=( 640, 640, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=False, pad_val=dict(img=114), scale=( 640, 640, ), type='LetterResize'), dict(scope='mmdet', type='LoadAnnotations', with_bbox=True), dict(type='LoadText'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'pad_param', 'texts', ), type='mmdet.PackDetInputs'), ], type='MultiModalDataset'), drop_last=False, num_workers=2, persistent_workers=True, pin_memory=True, sampler=dict(shuffle=False, type='DefaultSampler')) test_evaluator = dict( ann_file='data/coco/annotations/instances_val2017.json', metric='bbox', proposal_nums=( 100, 1, 10, ), type='mmdet.CocoMetric') test_pipeline = [ dict(backend_args=None, type='LoadImageFromFile'), dict(scale=( 640, 640, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=False, pad_val=dict(img=114), scale=( 640, 640, ), type='LetterResize'), dict(scope='mmdet', type='LoadAnnotations', with_bbox=True), dict(type='LoadText'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'pad_param', 'texts', ), type='mmdet.PackDetInputs'), ] text_channels = 512 text_model_name = '/home/hanyong/yolo-world/configs/openai/clip-vit-base-patch32' text_transform = [ dict( max_num_samples=80, num_neg_samples=( 80, 80, ), padding_to_max=True, padding_value='', type='RandomLoadText'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'texts', ), type='mmdet.PackDetInputs'), ] train_ann_file = 'annotations/instances_train2017.json' train_batch_size_per_gpu = 16 train_cfg = dict( dynamic_intervals=[ ( 10, 1, ), ], max_epochs=40, type='EpochBasedTrainLoop', val_interval=5) train_data_prefix = 'train2017/' train_dataloader = dict( batch_size=16, collate_fn=dict(type='yolow_collate'), dataset=dict( class_text_path='data/texts/coco_class_texts.json', dataset=dict( ann_file='annotations/instances_train2017.json', data_prefix=dict(img='train2017/'), data_root='data/coco/coco2017_50', filter_cfg=dict(filter_empty_gt=False, min_size=32), type='YOLOv5CocoDataset'), pipeline=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( img_scale=( 640, 640, ), pad_val=114.0, pre_transform=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), ], type='MultiModalMosaic'), dict( border=( -320, -320, ), border_val=( 114, 114, 114, ), max_aspect_ratio=100.0, max_rotate_degree=0.0, max_shear_degree=0.0, scaling_ratio_range=( 0.09999999999999998, 1.9, ), type='YOLOv5RandomAffine'), dict( pre_transform=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( img_scale=( 640, 640, ), pad_val=114.0, pre_transform=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), ], type='MultiModalMosaic'), dict( border=( -320, -320, ), border_val=( 114, 114, 114, ), max_aspect_ratio=100.0, max_rotate_degree=0.0, max_shear_degree=0.0, scaling_ratio_range=( 0.09999999999999998, 1.9, ), type='YOLOv5RandomAffine'), ], prob=0.15, type='YOLOv5MultiModalMixUp'), dict( bbox_params=dict( format='pascal_voc', label_fields=[ 'gt_bboxes_labels', 'gt_ignore_flags', ], type='BboxParams'), keymap=dict(gt_bboxes='bboxes', img='image'), transforms=[ dict(p=0.01, type='Blur'), dict(p=0.01, type='MedianBlur'), dict(p=0.01, type='ToGray'), dict(p=0.01, type='CLAHE'), ], type='mmdet.Albu'), dict(type='YOLOv5HSVRandomAug'), dict(prob=0.5, type='mmdet.RandomFlip'), dict( max_num_samples=80, num_neg_samples=( 80, 80, ), padding_to_max=True, padding_value='', type='RandomLoadText'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'texts', ), type='mmdet.PackDetInputs'), ], type='MultiModalDataset'), num_workers=8, persistent_workers=False, pin_memory=True, sampler=dict(shuffle=True, type='DefaultSampler')) train_num_workers = 8 train_pipeline = [ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( img_scale=( 640, 640, ), pad_val=114.0, pre_transform=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), ], type='MultiModalMosaic'), dict( border=( -320, -320, ), border_val=( 114, 114, 114, ), max_aspect_ratio=100.0, max_rotate_degree=0.0, max_shear_degree=0.0, scaling_ratio_range=( 0.09999999999999998, 1.9, ), type='YOLOv5RandomAffine'), dict( pre_transform=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict( img_scale=( 640, 640, ), pad_val=114.0, pre_transform=[ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), ], type='MultiModalMosaic'), dict( border=( -320, -320, ), border_val=( 114, 114, 114, ), max_aspect_ratio=100.0, max_rotate_degree=0.0, max_shear_degree=0.0, scaling_ratio_range=( 0.09999999999999998, 1.9, ), type='YOLOv5RandomAffine'), ], prob=0.15, type='YOLOv5MultiModalMixUp'), dict( bbox_params=dict( format='pascal_voc', label_fields=[ 'gt_bboxes_labels', 'gt_ignore_flags', ], type='BboxParams'), keymap=dict(gt_bboxes='bboxes', img='image'), transforms=[ dict(p=0.01, type='Blur'), dict(p=0.01, type='MedianBlur'), dict(p=0.01, type='ToGray'), dict(p=0.01, type='CLAHE'), ], type='mmdet.Albu'), dict(type='YOLOv5HSVRandomAug'), dict(prob=0.5, type='mmdet.RandomFlip'), dict( max_num_samples=80, num_neg_samples=( 80, 80, ), padding_to_max=True, padding_value='', type='RandomLoadText'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'texts', ), type='mmdet.PackDetInputs'), ] train_pipeline_stage2 = [ dict(backend_args=None, type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(scale=( 640, 640, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=True, pad_val=dict(img=114.0), scale=( 640, 640, ), type='LetterResize'), dict( border_val=( 114, 114, 114, ), max_aspect_ratio=100, max_rotate_degree=0.0, max_shear_degree=0.0, scaling_ratio_range=( 0.09999999999999998, 1.9, ), type='YOLOv5RandomAffine'), dict( bbox_params=dict( format='pascal_voc', label_fields=[ 'gt_bboxes_labels', 'gt_ignore_flags', ], type='BboxParams'), keymap=dict(gt_bboxes='bboxes', img='image'), transforms=[ dict(p=0.01, type='Blur'), dict(p=0.01, type='MedianBlur'), dict(p=0.01, type='ToGray'), dict(p=0.01, type='CLAHE'), ], type='mmdet.Albu'), dict(type='YOLOv5HSVRandomAug'), dict(prob=0.5, type='mmdet.RandomFlip'), dict( max_num_samples=80, num_neg_samples=( 80, 80, ), padding_to_max=True, padding_value='', type='RandomLoadText'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'flip', 'flip_direction', 'texts', ), type='mmdet.PackDetInputs'), ] tta_model = dict( tta_cfg=dict(max_per_img=300, nms=dict(iou_threshold=0.65, type='nms')), type='mmdet.DetTTAModel') tta_pipeline = [ dict(backend_args=None, type='LoadImageFromFile'), dict( transforms=[ [ dict( transforms=[ dict(scale=( 640, 640, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=False, pad_val=dict(img=114), scale=( 640, 640, ), type='LetterResize'), ], type='Compose'), dict( transforms=[ dict(scale=( 320, 320, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=False, pad_val=dict(img=114), scale=( 320, 320, ), type='LetterResize'), ], type='Compose'), dict( transforms=[ dict(scale=( 960, 960, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=False, pad_val=dict(img=114), scale=( 960, 960, ), type='LetterResize'), ], type='Compose'), ], [ dict(prob=1.0, type='mmdet.RandomFlip'), dict(prob=0.0, type='mmdet.RandomFlip'), ], [ dict(type='mmdet.LoadAnnotations', with_bbox=True), ], [ dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'pad_param', 'flip', 'flip_direction', ), type='mmdet.PackDetInputs'), ], ], type='TestTimeAug'), ] val_ann_file = 'annotations/instances_val2017.json' val_batch_size_per_gpu = 1 val_cfg = dict(type='ValLoop') val_data_prefix = 'val2017/' val_dataloader = dict( batch_size=1, dataset=dict( class_text_path='data/texts/coco_class_texts.json', dataset=dict( ann_file='annotations/instances_val2017.json', data_prefix=dict(img='val2017/'), data_root='data/coco/coco2017_50', filter_cfg=dict(filter_empty_gt=False, min_size=32), type='YOLOv5CocoDataset'), pipeline=[ dict(backend_args=None, type='LoadImageFromFile'), dict(scale=( 640, 640, ), type='YOLOv5KeepRatioResize'), dict( allow_scale_up=False, pad_val=dict(img=114), scale=( 640, 640, ), type='LetterResize'), dict(scope='mmdet', type='LoadAnnotations', with_bbox=True), dict(type='LoadText'), dict( meta_keys=( 'img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'pad_param', 'texts', ), type='mmdet.PackDetInputs'), ], type='MultiModalDataset'), drop_last=False, num_workers=2, persistent_workers=True, pin_memory=True, sampler=dict(shuffle=False, type='DefaultSampler')) val_evaluator = dict( ann_file='data/coco/coco2017_50/annotations/instances_val2017.json', metric='bbox', proposal_nums=( 100, 1, 10, ), type='mmdet.CocoMetric') val_interval_stage2 = 1 val_num_workers = 2 vis_backends = [ dict(type='LocalVisBackend'), ] visualizer = dict( name='visualizer', type='mmdet.DetLocalVisualizer', vis_backends=[ dict(type='LocalVisBackend'), ]) weight_decay = 0.0005 widen_factor = 1.0 work_dir = 'log'

06/24 05:52:14 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used. 06/24 05:52:14 - mmengine - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) RuntimeInfoHook
(49 ) EMAHook
(BELOW_NORMAL) LoggerHook

after_load_checkpoint: (49 ) EMAHook

before_train: (9 ) YOLOv5ParamSchedulerHook
(VERY_HIGH ) RuntimeInfoHook
(49 ) EMAHook
(NORMAL ) IterTimerHook
(VERY_LOW ) CheckpointHook

before_train_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook
(NORMAL ) PipelineSwitchHook

before_train_iter: (9 ) YOLOv5ParamSchedulerHook
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook

after_train_iter: (9 ) YOLOv5ParamSchedulerHook
(VERY_HIGH ) RuntimeInfoHook
(49 ) EMAHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(VERY_LOW ) CheckpointHook

after_train_epoch: (9 ) YOLOv5ParamSchedulerHook
(NORMAL ) IterTimerHook
(VERY_LOW ) CheckpointHook

before_val: (VERY_HIGH ) RuntimeInfoHook

before_val_epoch: (49 ) EMAHook
(NORMAL ) IterTimerHook

before_val_iter: (NORMAL ) IterTimerHook

after_val_iter: (NORMAL ) IterTimerHook
(NORMAL ) DetVisualizationHook
(BELOW_NORMAL) LoggerHook

after_val_epoch: (9 ) YOLOv5ParamSchedulerHook
(VERY_HIGH ) RuntimeInfoHook
(49 ) EMAHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(VERY_LOW ) CheckpointHook

after_val: (VERY_HIGH ) RuntimeInfoHook

before_save_checkpoint: (49 ) EMAHook

after_train: (VERY_HIGH ) RuntimeInfoHook
(VERY_LOW ) CheckpointHook

before_test: (VERY_HIGH ) RuntimeInfoHook

before_test_epoch: (49 ) EMAHook
(NORMAL ) IterTimerHook

before_test_iter: (NORMAL ) IterTimerHook

after_test_iter: (NORMAL ) IterTimerHook
(NORMAL ) DetVisualizationHook
(BELOW_NORMAL) LoggerHook

after_test_epoch: (VERY_HIGH ) RuntimeInfoHook
(49 ) EMAHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_test: (VERY_HIGH ) RuntimeInfoHook

after_run: (BELOW_NORMAL) LoggerHook

loading annotations into memory... Done (t=0.00s) creating index... index created! 06/24 05:52:16 - mmengine - INFO - paramwise_options -- bbox_head.head_module.cls_contrasts.0.logit_scale:lr=0.001 06/24 05:52:16 - mmengine - INFO - paramwise_options -- bbox_head.head_module.cls_contrasts.0.logit_scale:weight_decay=0.0 06/24 05:52:16 - mmengine - INFO - paramwise_options -- bbox_head.head_module.cls_contrasts.1.logit_scale:lr=0.001 06/24 05:52:16 - mmengine - INFO - paramwise_options -- bbox_head.head_module.cls_contrasts.1.logit_scale:weight_decay=0.0 06/24 05:52:16 - mmengine - INFO - paramwise_options -- bbox_head.head_module.cls_contrasts.2.logit_scale:lr=0.001 06/24 05:52:16 - mmengine - INFO - paramwise_options -- bbox_head.head_module.cls_contrasts.2.logit_scale:weight_decay=0.0 loading annotations into memory... Done (t=0.00s) creating index... index created! loading annotations into memory... Done (t=0.00s) creating index... index created! Did not find last_checkpoint to be resumed. 06/24 05:52:19 - mmengine - INFO - Auto resumed from the latest checkpoint None. 06/24 05:52:19 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io 06/24 05:52:19 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future. 06/24 05:52:19 - mmengine - INFO - Checkpoints will be saved to /home/hanyong/yolo-world/log. /opt/conda/lib/python3.8/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Traceback (most recent call last): File "tools/train.py", line 120, in main() File "tools/train.py", line 116, in main runner.train() File "/opt/conda/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1777, in train model = self.train_loop.run() # type: ignore File "/opt/conda/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run self.run_epoch() File "/opt/conda/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch for idx, data_batch in enumerate(self.dataloader): File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in next data = self._next_data() File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1204, in _next_data return self._process_data(data) File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data data.reraise() File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise raise exception SystemError: Caught SystemError in DataLoader worker process 1. Original Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/conda/lib/python3.8/site-packages/yolo_world/datasets/mm_dataset.py", line 86, in getitem return self.pipeline(data_info) File "/opt/conda/lib/python3.8/site-packages/mmengine/dataset/base_dataset.py", line 60, in call data = t(data) File "/opt/conda/lib/python3.8/site-packages/mmcv/transforms/base.py", line 12, in call return self.transform(results) File "/opt/conda/lib/python3.8/site-packages/mmdet/structures/bbox/box_type.py", line 267, in wrapper return func(self, results) File "/opt/conda/lib/python3.8/site-packages/mmdet/datasets/transforms/transforms.py", line 1699, in transform results = self.aug(**results) File "/opt/conda/lib/python3.8/site-packages/albumentations/core/composition.py", line 231, in call data = t(**data) File "/opt/conda/lib/python3.8/site-packages/albumentations/core/transforms_interface.py", line 94, in call return self.apply_with_params(params, **kwargs) File "/opt/conda/lib/python3.8/site-packages/albumentations/core/transforms_interface.py", line 107, in apply_with_params res[key] = target_function(arg, **dict(params, **target_dependencies)) File "/opt/conda/lib/python3.8/site-packages/albumentations/augmentations/transforms.py", line 1603, in apply return F.clahe(img, clip_limit, self.tile_grid_size) File "/opt/conda/lib/python3.8/site-packages/albumentations/augmentations/utils.py", line 136, in wrapped_function result = func(img, *args, **kwargs) File "/opt/conda/lib/python3.8/site-packages/albumentations/augmentations/functional.py", line 538, in clahe clahe_mat = cv2.createCLAHE(clipLimit=clip_limit, tileGridSize=[int(x) for x in tile_grid_size]) SystemError: new style getargs format but argument is not a tuple

937739823 avatar Jun 24 '24 05:06 937739823