mmcv icon indicating copy to clipboard operation
mmcv copied to clipboard

Add Cross-Iteration Batch Normalization and Accumulate Gradient

Open iumyx2612 opened this issue 2 years ago • 6 comments

Describe the feature Add Cross-Iteration Batch Normalization in: https://arxiv.org/abs/2002.05712
And Accumulate Gradient for training: https://github.com/WongKinYiu/ScaledYOLOv4/blob/yolov4-large/train.py#L77

Cross-Iteration BN helps model with small batch-size to achieve better results.
And Accumulate Gradient helps me compares to other papers when batch-size is not the same. Motivation A clear and concise description of the motivation of the feature. I don't have enough computation power to train with a big enough batch-size, and it's really hard to compare results to other papers when batch-size is not the same

Related resources If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful. Cross-Iteration BN: https://github.com/Howal/Cross-iterationBatchNorm
Accumulate Gradient: https://github.com/WongKinYiu/ScaledYOLOv4/blob/yolov4-large/train.py#L77

Additional context Add any other context or screenshots about the feature request here. If you would like to implement the feature and create a PR, please leave a comment here and that would be much appreciated.

iumyx2612 avatar Jun 25 '22 03:06 iumyx2612

Hi~ thanks for your suggestions, Gradient accumulative has been implemented here https://github.com/open-mmlab/mmcv/blob/1f2500102834a01b86bf9ae4db227cd8d724fa6e/mmcv/runner/hooks/optimizer.py#L99

I think it is a good idea to add Cross-Iteration Batch Normalization into NORM_LAYERS

HAOCHENYE avatar Jun 25 '22 14:06 HAOCHENYE

Hi~ thanks for your suggestions, Gradient accumulative has been implemented here

https://github.com/open-mmlab/mmcv/blob/1f2500102834a01b86bf9ae4db227cd8d724fa6e/mmcv/runner/hooks/optimizer.py#L99

I think it is a good idea to add Cross-Iteration Batch Normalization into NORM_LAYERS

I used GradientCumulativeOptimizerHook get this stack trace

2022-06-26 09:33:16,631 - mmseg - WARNING - GradientCumulativeOptimizerHook may slightly decrease performance if the model has BatchNorm layers.
Traceback (most recent call last):
  File "E:/Work work/Python/Work/Practice/Segmentation/mmsegmentation/tools/train.py", line 242, in <module>
    main()
  File "E:/Work work/Python/Work/Practice/Segmentation/mmsegmentation/tools/train.py", line 231, in main
    train_segmentor(
  File "E:\Work work\Python\Work\Practice\Segmentation\mmsegmentation\mmseg\apis\train.py", line 194, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\iter_based_runner.py", line 135, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\iter_based_runner.py", line 68, in train
    self.call_hook('after_train_iter')
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\hooks\optimizer.py", line 163, in after_train_iter
    loss.backward()
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\torch\_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Here's my config file

_base_ = [
    '../_base_/datasets/ade20k.py',
    '../_base_/default_runtime.py',
    '../_base_/schedules/schedule_40k.py',
]

custom_imports = dict(imports=['mmcls.models'], allow_failed_imports=False)
pretrained =\
    "https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b1_3rdparty_8xb32-aa-advprop_in1k_20220119-5715267d.pth"

model = dict(
    type='EncoderDecoder',
    backbone=dict(
        type='mmcls.EfficientNet',
        arch='b1',
        out_indices=(2, 3, 4, 5),
        init_cfg=dict(
            type='Pretrained',
            checkpoint=pretrained,
            prefix='backbone.'
        )
    ),
    neck=dict(
        type='FPN',
        in_channels=[24, 40, 112, 320],
        out_channels=256,
        num_outs=4
    ),
    decode_head=dict(
        type='FCNHead',
        in_channels=[256, 256, 256, 256],
        channels=128,
        num_classes=3,
        in_index=[0, 1, 2, 3],
        input_transform='resize_concat',
        concat_input=False,
        loss_decode=dict(
            type='FocalLoss',
            use_sigmoid=True
        )
    ),
)

# dataset settings
dataset_type = 'Secret'
data_root = '../Dataset'

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (128, 128)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=False),
    dict(type='Resize', img_scale=crop_size, keep_ratio=False, ratio_range=(1, 1)),
    dict(type='RandomFlip', prob=0.5),
    #dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=crop_size,
        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=False),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]


data = dict(
    samples_per_gpu=1,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='train/train',
        ann_dir='train_seg_map',
        pipeline=train_pipeline
    ),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='val',
        ann_dir='val_seg_map',
        pipeline=test_pipeline
    ),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='val',
        ann_dir='val_seg_map',
        pipeline=test_pipeline
    )
)

checkpoint_config = dict(by_epoch=False, interval=500)
evaluation = dict(interval=500, metric='mIoU', pre_eval=True)

log_config = dict(
    interval=1,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        # dict(type='TensorboardLoggerHook')
    ])

custom_hooks = [
    dict(
        type='GradientCumulativeOptimizerHook',
        cumulative_iters=2
    )
]

iumyx2612 avatar Jun 26 '22 02:06 iumyx2612

Hi~ thanks for your suggestions, Gradient accumulative has been implemented here https://github.com/open-mmlab/mmcv/blob/1f2500102834a01b86bf9ae4db227cd8d724fa6e/mmcv/runner/hooks/optimizer.py#L99

I think it is a good idea to add Cross-Iteration Batch Normalization into NORM_LAYERS

I used GradientCumulativeOptimizerHook get this stack trace

2022-06-26 09:33:16,631 - mmseg - WARNING - GradientCumulativeOptimizerHook may slightly decrease performance if the model has BatchNorm layers.
Traceback (most recent call last):
  File "E:/Work work/Python/Work/Practice/Segmentation/mmsegmentation/tools/train.py", line 242, in <module>
    main()
  File "E:/Work work/Python/Work/Practice/Segmentation/mmsegmentation/tools/train.py", line 231, in main
    train_segmentor(
  File "E:\Work work\Python\Work\Practice\Segmentation\mmsegmentation\mmseg\apis\train.py", line 194, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\iter_based_runner.py", line 135, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\iter_based_runner.py", line 68, in train
    self.call_hook('after_train_iter')
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\hooks\optimizer.py", line 163, in after_train_iter
    loss.backward()
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\torch\_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Here's my config file

_base_ = [
    '../_base_/datasets/ade20k.py',
    '../_base_/default_runtime.py',
    '../_base_/schedules/schedule_40k.py',
]

custom_imports = dict(imports=['mmcls.models'], allow_failed_imports=False)
pretrained =\
    "https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b1_3rdparty_8xb32-aa-advprop_in1k_20220119-5715267d.pth"

model = dict(
    type='EncoderDecoder',
    backbone=dict(
        type='mmcls.EfficientNet',
        arch='b1',
        out_indices=(2, 3, 4, 5),
        init_cfg=dict(
            type='Pretrained',
            checkpoint=pretrained,
            prefix='backbone.'
        )
    ),
    neck=dict(
        type='FPN',
        in_channels=[24, 40, 112, 320],
        out_channels=256,
        num_outs=4
    ),
    decode_head=dict(
        type='FCNHead',
        in_channels=[256, 256, 256, 256],
        channels=128,
        num_classes=3,
        in_index=[0, 1, 2, 3],
        input_transform='resize_concat',
        concat_input=False,
        loss_decode=dict(
            type='FocalLoss',
            use_sigmoid=True
        )
    ),
)

# dataset settings
dataset_type = 'Secret'
data_root = '../Dataset'

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (128, 128)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=False),
    dict(type='Resize', img_scale=crop_size, keep_ratio=False, ratio_range=(1, 1)),
    dict(type='RandomFlip', prob=0.5),
    #dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=crop_size,
        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=False),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]


data = dict(
    samples_per_gpu=1,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='train/train',
        ann_dir='train_seg_map',
        pipeline=train_pipeline
    ),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='val',
        ann_dir='val_seg_map',
        pipeline=test_pipeline
    ),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='val',
        ann_dir='val_seg_map',
        pipeline=test_pipeline
    )
)

checkpoint_config = dict(by_epoch=False, interval=500)
evaluation = dict(interval=500, metric='mIoU', pre_eval=True)

log_config = dict(
    interval=1,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        # dict(type='TensorboardLoggerHook')
    ])

custom_hooks = [
    dict(
        type='GradientCumulativeOptimizerHook',
        cumulative_iters=2
    )
]

Hi~ It seems that you use loss.backward() mannualy, and GradientOptimizerHook execute the backward the second time and raise the error. Do you execute loss.backward in model mannuly?

HAOCHENYE avatar Jun 26 '22 14:06 HAOCHENYE

Hi~ thanks for your suggestions, Gradient accumulative has been implemented here https://github.com/open-mmlab/mmcv/blob/1f2500102834a01b86bf9ae4db227cd8d724fa6e/mmcv/runner/hooks/optimizer.py#L99

I think it is a good idea to add Cross-Iteration Batch Normalization into NORM_LAYERS

I used GradientCumulativeOptimizerHook get this stack trace

2022-06-26 09:33:16,631 - mmseg - WARNING - GradientCumulativeOptimizerHook may slightly decrease performance if the model has BatchNorm layers.
Traceback (most recent call last):
  File "E:/Work work/Python/Work/Practice/Segmentation/mmsegmentation/tools/train.py", line 242, in <module>
    main()
  File "E:/Work work/Python/Work/Practice/Segmentation/mmsegmentation/tools/train.py", line 231, in main
    train_segmentor(
  File "E:\Work work\Python\Work\Practice\Segmentation\mmsegmentation\mmseg\apis\train.py", line 194, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\iter_based_runner.py", line 135, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\iter_based_runner.py", line 68, in train
    self.call_hook('after_train_iter')
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\hooks\optimizer.py", line 163, in after_train_iter
    loss.backward()
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\torch\_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Here's my config file

_base_ = [
    '../_base_/datasets/ade20k.py',
    '../_base_/default_runtime.py',
    '../_base_/schedules/schedule_40k.py',
]

custom_imports = dict(imports=['mmcls.models'], allow_failed_imports=False)
pretrained =\
    "https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b1_3rdparty_8xb32-aa-advprop_in1k_20220119-5715267d.pth"

model = dict(
    type='EncoderDecoder',
    backbone=dict(
        type='mmcls.EfficientNet',
        arch='b1',
        out_indices=(2, 3, 4, 5),
        init_cfg=dict(
            type='Pretrained',
            checkpoint=pretrained,
            prefix='backbone.'
        )
    ),
    neck=dict(
        type='FPN',
        in_channels=[24, 40, 112, 320],
        out_channels=256,
        num_outs=4
    ),
    decode_head=dict(
        type='FCNHead',
        in_channels=[256, 256, 256, 256],
        channels=128,
        num_classes=3,
        in_index=[0, 1, 2, 3],
        input_transform='resize_concat',
        concat_input=False,
        loss_decode=dict(
            type='FocalLoss',
            use_sigmoid=True
        )
    ),
)

# dataset settings
dataset_type = 'Secret'
data_root = '../Dataset'

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (128, 128)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=False),
    dict(type='Resize', img_scale=crop_size, keep_ratio=False, ratio_range=(1, 1)),
    dict(type='RandomFlip', prob=0.5),
    #dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=crop_size,
        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=False),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]


data = dict(
    samples_per_gpu=1,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='train/train',
        ann_dir='train_seg_map',
        pipeline=train_pipeline
    ),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='val',
        ann_dir='val_seg_map',
        pipeline=test_pipeline
    ),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='val',
        ann_dir='val_seg_map',
        pipeline=test_pipeline
    )
)

checkpoint_config = dict(by_epoch=False, interval=500)
evaluation = dict(interval=500, metric='mIoU', pre_eval=True)

log_config = dict(
    interval=1,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        # dict(type='TensorboardLoggerHook')
    ])

custom_hooks = [
    dict(
        type='GradientCumulativeOptimizerHook',
        cumulative_iters=2
    )
]

Hi~ It seems that you use loss.backward() mannualy, and GradientOptimizerHook execute the backward the second time and raise the error. Do you execute loss.backward in model mannuly?

Hi~ I assume I don't execute loss.backward mannualy. I use all the predefined components in MMSegmentation and didn't use any custom components. I only modified the config file

iumyx2612 avatar Jun 26 '22 15:06 iumyx2612

Hi~ thanks for your suggestions, Gradient accumulative has been implemented here https://github.com/open-mmlab/mmcv/blob/1f2500102834a01b86bf9ae4db227cd8d724fa6e/mmcv/runner/hooks/optimizer.py#L99

I think it is a good idea to add Cross-Iteration Batch Normalization into NORM_LAYERS

I used GradientCumulativeOptimizerHook get this stack trace

2022-06-26 09:33:16,631 - mmseg - WARNING - GradientCumulativeOptimizerHook may slightly decrease performance if the model has BatchNorm layers.
Traceback (most recent call last):
  File "E:/Work work/Python/Work/Practice/Segmentation/mmsegmentation/tools/train.py", line 242, in <module>
    main()
  File "E:/Work work/Python/Work/Practice/Segmentation/mmsegmentation/tools/train.py", line 231, in main
    train_segmentor(
  File "E:\Work work\Python\Work\Practice\Segmentation\mmsegmentation\mmseg\apis\train.py", line 194, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\iter_based_runner.py", line 135, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\iter_based_runner.py", line 68, in train
    self.call_hook('after_train_iter')
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\hooks\optimizer.py", line 163, in after_train_iter
    loss.backward()
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\torch\_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Here's my config file

_base_ = [
    '../_base_/datasets/ade20k.py',
    '../_base_/default_runtime.py',
    '../_base_/schedules/schedule_40k.py',
]

custom_imports = dict(imports=['mmcls.models'], allow_failed_imports=False)
pretrained =\
    "https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b1_3rdparty_8xb32-aa-advprop_in1k_20220119-5715267d.pth"

model = dict(
    type='EncoderDecoder',
    backbone=dict(
        type='mmcls.EfficientNet',
        arch='b1',
        out_indices=(2, 3, 4, 5),
        init_cfg=dict(
            type='Pretrained',
            checkpoint=pretrained,
            prefix='backbone.'
        )
    ),
    neck=dict(
        type='FPN',
        in_channels=[24, 40, 112, 320],
        out_channels=256,
        num_outs=4
    ),
    decode_head=dict(
        type='FCNHead',
        in_channels=[256, 256, 256, 256],
        channels=128,
        num_classes=3,
        in_index=[0, 1, 2, 3],
        input_transform='resize_concat',
        concat_input=False,
        loss_decode=dict(
            type='FocalLoss',
            use_sigmoid=True
        )
    ),
)

# dataset settings
dataset_type = 'Secret'
data_root = '../Dataset'

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (128, 128)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=False),
    dict(type='Resize', img_scale=crop_size, keep_ratio=False, ratio_range=(1, 1)),
    dict(type='RandomFlip', prob=0.5),
    #dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=crop_size,
        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=False),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]


data = dict(
    samples_per_gpu=1,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='train/train',
        ann_dir='train_seg_map',
        pipeline=train_pipeline
    ),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='val',
        ann_dir='val_seg_map',
        pipeline=test_pipeline
    ),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='val',
        ann_dir='val_seg_map',
        pipeline=test_pipeline
    )
)

checkpoint_config = dict(by_epoch=False, interval=500)
evaluation = dict(interval=500, metric='mIoU', pre_eval=True)

log_config = dict(
    interval=1,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        # dict(type='TensorboardLoggerHook')
    ])

custom_hooks = [
    dict(
        type='GradientCumulativeOptimizerHook',
        cumulative_iters=2
    )
]

Hi~ It seems that you use loss.backward() mannualy, and GradientOptimizerHook execute the backward the second time and raise the error. Do you execute loss.backward in model mannuly?

Hi~ I assume I don't execute loss.backward mannualy. I use all the predefined components in MMSegmentation and didn't use any custom components. I only modified the config file

https://github.com/open-mmlab/mmcv/issues/1379, it seems GradientOptimizerHook should be set in optimizer_config. Otherwise OptimizerHook and GradientOptimzerHook will be registered both.

HAOCHENYE avatar Jun 26 '22 16:06 HAOCHENYE

Hi~ thanks for your suggestions, Gradient accumulative has been implemented here https://github.com/open-mmlab/mmcv/blob/1f2500102834a01b86bf9ae4db227cd8d724fa6e/mmcv/runner/hooks/optimizer.py#L99

I think it is a good idea to add Cross-Iteration Batch Normalization into NORM_LAYERS

I used GradientCumulativeOptimizerHook get this stack trace

2022-06-26 09:33:16,631 - mmseg - WARNING - GradientCumulativeOptimizerHook may slightly decrease performance if the model has BatchNorm layers.
Traceback (most recent call last):
  File "E:/Work work/Python/Work/Practice/Segmentation/mmsegmentation/tools/train.py", line 242, in <module>
    main()
  File "E:/Work work/Python/Work/Practice/Segmentation/mmsegmentation/tools/train.py", line 231, in main
    train_segmentor(
  File "E:\Work work\Python\Work\Practice\Segmentation\mmsegmentation\mmseg\apis\train.py", line 194, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\iter_based_runner.py", line 135, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\iter_based_runner.py", line 68, in train
    self.call_hook('after_train_iter')
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\mmcv\runner\hooks\optimizer.py", line 163, in after_train_iter
    loss.backward()
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\torch\_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "E:\Anaconda\envs\openmmlab\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Here's my config file

_base_ = [
    '../_base_/datasets/ade20k.py',
    '../_base_/default_runtime.py',
    '../_base_/schedules/schedule_40k.py',
]

custom_imports = dict(imports=['mmcls.models'], allow_failed_imports=False)
pretrained =\
    "https://download.openmmlab.com/mmclassification/v0/efficientnet/efficientnet-b1_3rdparty_8xb32-aa-advprop_in1k_20220119-5715267d.pth"

model = dict(
    type='EncoderDecoder',
    backbone=dict(
        type='mmcls.EfficientNet',
        arch='b1',
        out_indices=(2, 3, 4, 5),
        init_cfg=dict(
            type='Pretrained',
            checkpoint=pretrained,
            prefix='backbone.'
        )
    ),
    neck=dict(
        type='FPN',
        in_channels=[24, 40, 112, 320],
        out_channels=256,
        num_outs=4
    ),
    decode_head=dict(
        type='FCNHead',
        in_channels=[256, 256, 256, 256],
        channels=128,
        num_classes=3,
        in_index=[0, 1, 2, 3],
        input_transform='resize_concat',
        concat_input=False,
        loss_decode=dict(
            type='FocalLoss',
            use_sigmoid=True
        )
    ),
)

# dataset settings
dataset_type = 'Secret'
data_root = '../Dataset'

img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (128, 128)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=False),
    dict(type='Resize', img_scale=crop_size, keep_ratio=False, ratio_range=(1, 1)),
    dict(type='RandomFlip', prob=0.5),
    #dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=crop_size,
        # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=False),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]


data = dict(
    samples_per_gpu=1,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='train/train',
        ann_dir='train_seg_map',
        pipeline=train_pipeline
    ),
    val=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='val',
        ann_dir='val_seg_map',
        pipeline=test_pipeline
    ),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='val',
        ann_dir='val_seg_map',
        pipeline=test_pipeline
    )
)

checkpoint_config = dict(by_epoch=False, interval=500)
evaluation = dict(interval=500, metric='mIoU', pre_eval=True)

log_config = dict(
    interval=1,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=False),
        # dict(type='TensorboardLoggerHook')
    ])

custom_hooks = [
    dict(
        type='GradientCumulativeOptimizerHook',
        cumulative_iters=2
    )
]

Hi~ It seems that you use loss.backward() mannualy, and GradientOptimizerHook execute the backward the second time and raise the error. Do you execute loss.backward in model mannuly?

Hi~ I assume I don't execute loss.backward mannualy. I use all the predefined components in MMSegmentation and didn't use any custom components. I only modified the config file

#1379, it seems GradientOptimizerHook should be set in optimizer_config. Otherwise OptimizerHook and GradientOptimzerHook will be registered both.

Thank you so much, working now

iumyx2612 avatar Jun 27 '22 03:06 iumyx2612