mmrotate icon indicating copy to clipboard operation
mmrotate copied to clipboard

使用 rotate fcos 输出模型参数的时候报错 The value is the same before and after calling `init_weights` of RotatedFCOS

Open locusbear opened this issue 2 years ago • 7 comments

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug A clear and concise description of what the bug is. 在使用rotate fcous模型的时候在输出模型参数的过程中出现

bbox_head.scales.0.scale - torch.Size([]): The value is the same before and after calling init_weights of RotatedFCOS

bbox_head.scales.1.scale - torch.Size([]): The value is the same before and after calling init_weights of RotatedFCOS

bbox_head.scales.2.scale - torch.Size([]): The value is the same before and after calling init_weights of RotatedFCOS

bbox_head.scales.3.scale - torch.Size([]): The value is the same before and after calling init_weights of RotatedFCOS

bbox_head.scales.4.scale - torch.Size([]): The value is the same before and after calling init_weights of RotatedFCOS

bbox_head.scale_angle.scale - torch.Size([]): The value is the same before and after calling init_weights of RotatedFCOS 无论是否加载预训练模型都会有该报错

Reproduction

  1. What command or script did you run? python tools/train.py
A placeholder for the command.
  1. Did you make any modifications on the code or config? Did you understand what you have modified? rotated_fcos_r50_fpn_1x_dota_le90.py

base = [ '../base/datasets/dotav1.py', '../base/schedules/schedule_1x.py', '../base/default_runtime.py' ] angle_version = 'le90'

model settings

model = dict( type='RotatedFCOS', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, zero_init_residual=False, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), #style='pytorch', #init_cfg=dict(type='Pretrained', checkpoint='./pre_model/resnet50-19c8e357.pth')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs='on_output', # use P5 num_outs=5, relu_before_extra_convs=True), bbox_head=dict( type='RotatedFCOSHead', num_classes=15, in_channels=256, stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64, 128], center_sampling=True, center_sample_radius=1.5, norm_on_bbox=True, centerness_on_reg=True, separate_angle=False, scale_angle=True, bbox_coder=dict( type='DistanceAnglePointCoder', angle_version=angle_version), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='RotatedIoULoss', loss_weight=1.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), # training and testing settings train_cfg=None, test_cfg=dict( nms_pre=2000, min_bbox_size=0, score_thr=0.05, nms=dict(iou_thr=0.1), max_per_img=2000))

img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RResize', img_scale=(1024, 1024)), dict( type='RRandomFlip', flip_ratio=[0.25, 0.25, 0.25], direction=['horizontal', 'vertical', 'diagonal'], version=angle_version), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] data = dict( train=dict(pipeline=train_pipeline, version=angle_version), val=dict(version=angle_version), test=dict(version=angle_version)) 预训练模型哪里无论修改与否都会报错

  1. What dataset did you use?

Environment

mmcv-full 1.3.18 mmdet 2.25.1 mmocr 0.6.1 mmrotate 0.3.2 torch 1.6.0+cu101 torchvision 0.7.0+cu101

  1. Please run python mmrotate/utils/collect_env.py to collect necessary environment information and paste it here. sys.platform: linux Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] CUDA available: False GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.7.0 PyTorch compiling details: PyTorch built with:
  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.1 OpenCV: 4.6.0 MMCV: 1.6.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMRotate: 0.3.2+c62f148

  1. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

locusbear avatar Oct 12 '22 07:10 locusbear

We recommend using English or English & Chinese for issues so that we could have broader discussion.

mm-assistant[bot] avatar Oct 12 '22 07:10 mm-assistant[bot]

在跑image_demo.py的时候也会卡住或者报错,且没有显示报错信息,只显示了 load checkpoint from local path:

locusbear avatar Oct 12 '22 10:10 locusbear

/home/kas/ori_code/watermark/check/mmrotate/mmrotate/utils/setup_env.py:39: UserWarning: Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. f'Setting OMP_NUM_THREADS environment variable for each process ' /home/kas/ori_code/watermark/check/mmrotate/mmrotate/utils/setup_env.py:49: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. f'Setting MKL_NUM_THREADS environment variable for each process ' 2022-10-14 10:25:06,328 - mmrotate - INFO - Environment info:

sys.platform: linux Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] CUDA available: True GPU 0: NVIDIA GeForce RTX 2080 Ti CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.24 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.7.1+cu101 PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
  • CuDNN 7.6.3
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, TorchVision: 0.8.2+cu101 OpenCV: 4.1.1 MMCV: 1.6.0 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMRotate: 0.3.2+c62f148

2022-10-14 10:25:07,164 - mmrotate - INFO - Distributed training: False 2022-10-14 10:25:08,045 - mmrotate - INFO - Config: dataset_type = 'DOTADataset' data_root = '/home/kas/ori_code/watermark/east_data_924/1011_add_doc_data_mmrotate/split_data/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RResize', img_scale=(1024, 1024)), dict( type='RRandomFlip', flip_ratio=[0.25, 0.25, 0.25], direction=['horizontal', 'vertical', 'diagonal'], version='le90'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='RResize'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='DOTADataset', ann_file= '/home/kas/ori_code/watermark/east_data_924/1011_add_doc_data_mmrotate/split_data/trainval/annfiles/', img_prefix= '/home/kas/ori_code/watermark/east_data_924/1011_add_doc_data_mmrotate/split_data/trainval/images/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RResize', img_scale=(1024, 1024)), dict( type='RRandomFlip', flip_ratio=[0.25, 0.25, 0.25], direction=['horizontal', 'vertical', 'diagonal'], version='le90'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], version='le90'), val=dict( type='DOTADataset', ann_file= '/home/kas/ori_code/watermark/east_data_924/1011_add_doc_data_mmrotate/split_data/trainval/annfiles/', img_prefix= '/home/kas/ori_code/watermark/east_data_924/1011_add_doc_data_mmrotate/split_data/trainval/images/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='RResize'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], version='le90'), test=dict( type='DOTADataset', ann_file= '/home/kas/ori_code/watermark/east_data_924/1011_add_doc_data_mmrotate/split_data/test/images/', img_prefix= '/home/kas/ori_code/watermark/east_data_924/1011_add_doc_data_mmrotate/split_data/test/images/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='RResize'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], version='le90')) evaluation = dict(interval=1, metric='mAP') optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.3333333333333333, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=12) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' angle_version = 'le90' model = dict( type='RotatedFCOS', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, zero_init_residual=False, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs='on_output', num_outs=5, relu_before_extra_convs=True), bbox_head=dict( type='RotatedFCOSHead', num_classes=1, in_channels=256, stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64, 128], center_sampling=True, center_sample_radius=1.5, norm_on_bbox=True, centerness_on_reg=True, separate_angle=False, scale_angle=True, bbox_coder=dict(type='DistanceAnglePointCoder', angle_version='le90'), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='RotatedIoULoss', loss_weight=1.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), train_cfg=None, test_cfg=dict( nms_pre=2000, min_bbox_size=0, score_thr=0.05, nms=dict(iou_thr=0.1), max_per_img=2000)) work_dir = 'run' auto_resume = False gpu_ids = range(0, 1) 2022-10-14 10:25:08,045 - mmrotate - INFO - Set random seed to 299207498, deterministic: False 2022-10-14 10:25:16,776 - mmrotate - INFO - initialize ResNet with init_cfg [{'type': 'Kaiming', 'layer': 'Conv2d'}, {'type': 'Constant', 'val': 1, 'layer': ['_BatchNorm', 'GroupNorm']}] 2022-10-14 10:25:49,269 - mmrotate - INFO - initialize FPN with init_cfg {'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'} 2022-10-14 10:25:51,575 - mmrotate - INFO - initialize RotatedFCOSHead with init_cfg {'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01, 'override': {'type': 'Normal', 'name': 'conv_cls', 'std': 0.01, 'bias_prob': 0.01}} 没有输出报错信息

locusbear avatar Oct 14 '22 02:10 locusbear

As far as we know, The value is the same before and after calling init_weights of RotatedFCOS will not affect the running of the model. The model may be stopped for other reasons. Can you run other models normally?

zytx121 avatar Oct 19 '22 02:10 zytx121

@liuyanyi Please have a look.

zytx121 avatar Oct 19 '22 02:10 zytx121

If the log stuck at the initialize with init_cfg, maybe the process stuck at loading annotations? You can wait a little bit longer to see if it works. But for image_demo, it doesn't need to read annotations, you can manually terminate the code when it gets stuck to see the stuck line.

liuyanyi avatar Oct 19 '22 02:10 liuyanyi

我自己的问题,没有给足够的显存,是卡在了torch读取模型哪里了

locusbear avatar Oct 28 '22 07:10 locusbear

训练的时候The value is the same before and after calling init_weights 这个问题解决了吗,我也遇到这个问题 @locusbear

fudemin1 avatar Jun 06 '23 07:06 fudemin1

训练的时候The value is the same before and after calling init_weights 这个问题解决了吗,我也遇到这个问题 @locusbear

解决了的,是我没有给足够的显存,导致程序在读取加载模型那里卡住了,也不会报错

locusbear avatar Jun 07 '23 03:06 locusbear

请问这会影响最终的结果吗,我也显示这个,最终结果很低

2252033991 avatar Oct 12 '23 14:10 2252033991